- Research article
- Open Access
Evolutionary rates and patterns for human transcription factor binding sites derived from repetitive DNA
© Polavarapu et al; licensee BioMed Central Ltd. 2008
- Received: 11 January 2008
- Accepted: 17 May 2008
- Published: 17 May 2008
The majority of human non-protein-coding DNA is made up of repetitive sequences, mainly transposable elements (TEs). It is becoming increasingly apparent that many of these repetitive DNA sequence elements encode gene regulatory functions. This fact has important evolutionary implications, since repetitive DNA is the most dynamic part of the genome. We set out to assess the evolutionary rate and pattern of experimentally characterized human transcription factor binding sites (TFBS) that are derived from repetitive versus non-repetitive DNA to test whether repeat-derived TFBS are in fact rapidly evolving. We also evaluated the position-specific patterns of variation among TFBS to look for signs of functional constraint on TFBS derived from repetitive and non-repetitive DNA.
We found numerous experimentally characterized TFBS in the human genome, 7–10% of all mapped sites, which are derived from repetitive DNA sequences including simple sequence repeats (SSRs) and TEs. TE-derived TFBS sequences are far less conserved between species than TFBS derived from SSRs and non-repetitive DNA. Despite their rapid evolution, several lines of evidence indicate that TE-derived TFBS are functionally constrained. First of all, ancient TE families, such as MIR and L2, are enriched for TFBS relative to younger families like Alu and L1. Secondly, functionally important positions in TE-derived TFBS, specifically those residues thought to physically interact with their cognate protein binding factors (TF), are more evolutionarily conserved than adjacent TFBS positions. Finally, TE-derived TFBS show position-specific patterns of sequence variation that are highly distinct from random patterns and similar to the variation seen for non-repeat derived sequences of the same TFBS.
The abundance of experimentally characterized human TFBS that are derived from repetitive DNA speaks to the substantial regulatory effects that this class of sequence has on the human genome. The unique evolutionary properties of repeat-derived TFBS are perhaps even more intriguing. TE-derived TFBS in particular, while clearly functionally constrained, evolve extremely rapidly relative to non-repeat derived sites. Such rapidly evolving TFBS are likely to confer species-specific regulatory phenotypes, i.e. divergent expression patterns, on the human evolutionary lineage. This result has practical implications with respect to the widespread use of evolutionary conservation as a surrogate for functionally relevant non-coding DNA. Most TE-derived TFBS would be missed using the kinds of sequence conservation-based screens, such as phylogenetic footprinting, that are used to help characterize non-coding DNA. Thus, the very TFBS that are most likely to yield human-specific characteristics will be neglected by the comparative genomic techniques that are currently de rigeur for the identification of novel regulatory sites.
- Transcription Factor Binding Site
- Phylogenetic Footprinting
- Context Position
- Genome Frequency
- Relative Evolutionary Rate
The vast majority of the human genome is made up of non-protein-coding sequences [1, 2], and the specific function of such DNA is often unknown. As of late, elucidating the functional relevance of the non-coding fraction of the human genome has become a major priority for computational and functional genomics .
Most of the non-protein-coding fraction of the human genome is made up of repetitive DNA sequences, primarily transposable elements (TEs), which alone make at least 45% of the genome. In one sense, these TEs can be considered as genomic parasites that exist solely by virtue of their ability to out-replicate the host genome in which they reside [4, 5]. On the other hand, it has become abundantly clear that, once established in a genome, TEs can contribute to genome function in a number of different ways . For instance, TEs are known to donate a wide variety of gene regulatory sequences to the human genome [7–9], and TE-derived regulatory sequences exert diversifying effects on the expression patterns of adjacent genes (reviewed in [10–12]).
TE-derived regulatory sequences are particularly interesting from an evolutionary perspective because of their potential to drive gene expression divergence between species. The potential for TEs to cause regulatory changes between evolutionary lineages is related to the fact that TEs invariably represent the most rapidly changing, lineage-specific part of eukaryotic genomes. For instance, when the human and mouse genomes sequences were compared, it became apparent that 99% of protein coding genes had human-mouse homologs, with 80% having direct 1:1 orthologs, whereas only 13% of mouse and 48% of human TEs were shared between the two species . TE dynamics can even lead to substantial differences between genomes over relatively short evolutionary time scales. Indeed, the human evolutionary lineage has experience a TE-driven genome expansion of 500 Mb in the last 50 million years and 30 Mb since the divergence from chimpanzees .
Taken together with their ability to donate regulatory sequences, this lineage-specific character of TEs suggests that the regulatory elements they donate may lead to species-specific differences in gene expression. In fact, a primate-specific endogenous retroviral element has been shown to donate an enhancer that confers a distinct parotid-specific expression pattern on the human amylase gene . A more recent genome scale analysis showed that TE-derived human regulatory sites are associated with genes that have increased tissue-specific expression divergence between human and mouse . A corollary prediction of this model for the diversifying regulatory effects of TEs is that TE-derived regulatory sequences will have anomalously rapid evolutionary rates. Consistent with this expectation, we previously found that TE-derived human transcription factor binding sites (TFBS) are much less likely to have orthologs in the mouse genome than non-repetitive TFBS .
In this study, we set out to assess the relative evolutionary rates and the position-specific patterns of variation for human TFBS that are derived from repetitive versus non-repetitive DNA. We relied on the analysis of experimentally characterized TFBS that can be unambiguously mapped to the human genome in order to determine their evolutionary origins in repetitive or non-repetitive DNA. Our results suggest that TE-derived TFBS show both rapid evolution and, in some cases, anomalous position-specific patterns of change relative to non-repetitive TFBS. Despite these distinct evolutionary characteristics, the TE-derived TFBS do show sequence divergence patterns that are consistent with the conservation of function.
Human TFBS from repetitive DNA
Counts for human TFBS derived from repetitive DNA.
All other LINEs
Evolutionary sequence conservation of repeat-derived TFBS
Having shown the high levels of sequence divergence for TE-derived TFBS, it is worth noting that evolutionary conservation is often taken as a measure of functional relevance. For instance, the phylogenetic footprinting approach identifies highly conserved regulatory sequences as more likely to be functional [22, 23]. While a number of functionally relevant TE-derived sequences have recently been identified by virtue of their sequence conservation [24–28], the relatively unconserved TE-derived TFBS revealed by our analysis would almost certainly be overlooked by phylogenetic footprinting methods. However, the TFBS that we analyzed were experimentally characterized, not predicted, and are thus quite likely to represent bona fide functional regulatory elements. In fact, the analysis of the relative evolutionary rates for different positions in the TFBS described below demonstrates that the specific pattern of conservation across sites supports the assertion that the TE-derived TFBS are functional.
TRANSFAC annotations in the site table represent individual residues in TFBS with either upper-case or lower-case letters. The upper-case residues correspond to specific sequence motifs within the site that were emphasized by the authors of the cited literature. We consider upper-case residues to be more likely to form specific DNA-protein contacts. Accordingly, the upper- and lower-case TRANSFAC annotations were used to partition TFBS residues into putative 'contact' positions, which are thought to physically interact with transcription factors (TF), versus 'context' positions that make up the rest of the site. Presumably, putative contact positions are more functionally relevant than context positions, i.e. a change of sequence at a contact position would have more of an effect on TF binding than a change at a context position would. If this is indeed the case, then according to the phylogenetic footprinting rationale, contact positions should be more conserved than context positions. This prediction is confirmed for all three categories of TFBS seen in Figure 2, and all differences between conservation levels for contact versus context positions within categories are statistically significant (7.5>t>3.0 8.4e-11<P < 2.5e-3). In other words, although TE-derived TFBS do evolve more rapidly than the other categories of TFBS, the position-specific patterns of TE-TFBS sequence divergence are nonetheless consistent with selective constraint based on their regulatory function.
Evolutionary sequence conservation of human TFBS.
0.407 ± 0.085
0.410 ± 0.074
0.400 ± 0.110
0.115 ± 0.042
0.130 ± 0.041
0.088 ± 0.045
0.170 ± 0.056
0.183 ± 0.052
0.145 ± 0.062
0.047 ± 0.026
0.059 ± 0.026
0.028 ± 0.026
0.002 ± 0.002
0.003 ± 0.003
0.002 ± 0.001
0.028 ± 0.017
0.048 ± 0.026
0.003 ± 0.004
0.068 ± 0.063
0.077 ± 0.068
0.047 ± 0.052
All other LINEs
0.066 ± 0.018
0.095 ± 0.022
0.012 ± 0.011
0.141 ± 0.076
0.145 ± 0.042
0.136 ± 0.119
0.043 ± 0.029
0.057 ± 0.038
0.016 ± 0.009
Another relevant point from the class/family specific evolutionary conservation data is the fact that the relative rates of contact versus context TFBS position divergence are consistent across all categories observed (Table 2). The greater conservation of contact positions is seen for even the least conserved Alu family (t = 4.76 P = 2.7e-6). This indicates that the signal of functional constraint on TE-derived TFBS holds irrespective of the age of the elements from which the TFBS are derived, and serves as an independent confirmation of the experimental evidence in support of their identification.
Position-specific variation patterns for TE-derived TFBS
Position-specific sequence variation scores for TE-derived, non-repetitive, matrix-random and genome-random TFBS.
Protein binding factor2
T-cell-specific transcription factor 4 (TCF-4 or TCF7L2)
5.69 ± 0.51
5.80 ± 0.73
-48.76 ± 15.61
-48.63 ± 14.97
6.65 ± 1.26
5.52 ± 1.92
-2.79 ± 3.02
-4.71 ± 3.35
GATA binding proteins (GATA)
5.26 ± 1.56
5.27 ± 1.46
-5.87 ± 2.71
-4.70 ± 3.15
Androgen receptor (AR)
4.45 ± 1.21
4.33 ± 1.74
-2.29 ± 1.28
-1.80 ± 2.17
Glioma-associated oncogene homolog 1 (GLI1)
9.12 ± 1.14
9.24 ± 1.70
1.77 ± 2.83
-4.28 ± 2.91
There are numerous experimentally characterized TFBS in the human genome (7–10%) that are derived from repetitive DNA indicating a pronounced effect of repetitive DNA on human gene regulation. TFBS that originate from repeats evolve more rapidly than non-repetitive TFBS but still shown signs of sequence conservation on functionally critical residues due to purifying selection. Position-specific patterns sequence variation observed for TE-derived TFBS, in terms of the specific nucleotide composition along the positions of the TFBS, also point to divergence in the face of functional constraint. These findings are consistent with the notion that TFBS originating from repetitive DNA elements are likely to provide functionally relevant regulatory divergence between species.
Experimentally characterized human transcription factor binding sites (TFBS) were retrieved from the Professional release 11.3 (9/10/07) of the TRANSFAC database . These TFBS were mapped to the July 2003 human reference sequence  (National Center for Biotechnology (NCBI) Build 34 or hg16) using the program site2genome . For many individual TFBS, TRANSFAC annotations list GenBank accessions that provide longer flanking sequence context for the relatively short TFBS contained within the sequence. Site2genome uses this flanking sequence context to allow for one-to-one TFBS-to-genome mapping. Only TFBS that could be unambiguously mapped to the human genome sequence (1,810 out of 2,521) were taken for further analysis, and these TFBS mappings were transferred to the current human genome build (NCBI Build 36 or hg18) using the UCSC Genome Browser  'liftover' utility. The locations of human TFBS were compared to the locations of repetitive DNA, transposable elements (TEs) and simple sequence repeats (SSRs), annotated with the RepeatMasker program .
The evolutionary conservation levels for human TFBS were determined based on complete genome sequence alignments  between the human genome and 16 other vertebrate genomes . These alignments have been analyzed, along with the phylogenetic tree of the species, by the program phastCons  to make predictions of discrete conserved genomic elements and to produce conservation level scores for each position (base) in the human genome. The base-by-base conservation level scores range from 0 to 1 and represent the posterior probability of every individual position in the genome being in a conserved element. Base-by-base conservation level scores were taken across all positions of the mapped TFBS and then averaged for the different categories compared in Table 2 and Figure 2.
Individual TFBS were broken down into putative contact and context positions using the TRANSFAC site table annotations. In the site table, the TFBS sequences are represented with upper-case and lower-case residues. The upper-case TFBS residues correspond to specific sequence motifs within the site that were emphasized by the authors of the cited literature. We consider upper-case residues to be more likely to form specific DNA-protein contacts than lower case residues. Accordingly, the upper- and lower-case TRANSFAC annotations were used to partition TFBS residues into putative 'contact' positions, which are thought to physically interact with transcription factors (TF), versus 'context' positions. TFBS were also divided into those derived from repetitive, TE and SSR, versus non-repetitive classes and average conservation scores were determined for each TFBS class over each residue (contact and context) class. The statistical significance of the differences between average evolutionary conservation levels was evaluated using the Students' t-test.
where cr, i= counts of residue r at position i, s r is a pseudocount function = 1, and n = the total number of TFBS used to build the model. These probabilities (pr, i) are normalized by the background genome frequencies of the DNA residues (p r ) to compute weights (W):
Wr, i= pr, i/p r
where Wr, i= the weight of the observed residue r at position i and n = the number of sites in the TFBS PWM. Individual TFBS from the TRANSFAC site table were scored using the leave-one-out method whereby matrix-specific PFMs were iteratively built without residue counts from the particular TFBS being scored. Scores (S) were compared for individual TE-derived and non-repetitive TFBS along with the score distributions for simulated sets of matrix-random and genome-random sites.
IKJ was supported by the School of Biology at the Georgia Institute of Technology. LM–R and DL were supported by the Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine at the National Institutes of Health. JFMcD and NP were supported by a grant from the Georgia Tech Research Foundation.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA: The sequence of the human genome. Science. 2001, 291 (5507): 1304-1351. 10.1126/science.1058040.PubMedView ArticleGoogle Scholar
- Consortium EP: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004, 306 (5696): 636-640. 10.1126/science.1105136.View ArticleGoogle Scholar
- Doolittle WF, Sapienza C: Selfish genes, the phenotype paradigm and genome evolution. Nature. 1980, 284 (5757): 601-603. 10.1038/284601a0.PubMedView ArticleGoogle Scholar
- Orgel LE, Crick FH: Selfish DNA: the ultimate parasite. Nature. 1980, 284 (5757): 604-607. 10.1038/284604a0.PubMedView ArticleGoogle Scholar
- Kidwell MG, Lisch DR: Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution Int J Org Evolution. 2001, 55 (1): 1-24.View ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Glazko GV, Koonin EV: Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 2003, 19 (2): 68-72. 10.1016/S0168-9525(02)00006-9.PubMedView ArticleGoogle Scholar
- Thornburg BG, Gotea V, Makalowski W: Transposable elements as a significant source of transcription regulating signals. Gene. 2006, 365: 104-110. 10.1016/j.gene.2005.09.036.PubMedView ArticleGoogle Scholar
- Lagemaat van de LN, Landry JR, Mager DL, Medstrand P: Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 2003, 19 (10): 530-536. 10.1016/j.tig.2003.08.004.PubMedView ArticleGoogle Scholar
- Britten RJ: DNA sequence insertion and evolutionary variation in gene regulation. Proc Natl Acad Sci USA. 1996, 93 (18): 9374-9377. 10.1073/pnas.93.18.9374.PubMedView ArticleGoogle Scholar
- Britten RJ: Mobile elements inserted in the distant past have taken on important functions. Gene. 1997, 205 (1–2): 177-182. 10.1016/S0378-1119(97)00399-5.PubMedView ArticleGoogle Scholar
- Medstrand P, Lagemaat van de LN, Dunn CA, Landry JR, Svenback D, Mager DL: Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res. 2005, 110 (1–4): 342-352. 10.1159/000084966.PubMedView ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562. 10.1038/nature01262.PubMedView ArticleGoogle Scholar
- Liu G, Zhao S, Bailey JA, Sahinalp SC, Alkan C, Tuzun E, Green ED, Eichler EE: Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome research. 2003, 13 (3): 358-368. 10.1101/gr.923303.PubMedView ArticleGoogle Scholar
- Samuelson LC, Wiebauer K, Snow CM, Meisler MH: Retroviral and pseudogene insertion sites reveal the lineage of human salivary and pancreatic amylase genes from a single gene during primate evolution. Mol Cell Biol. 1990, 10 (6): 2513-2520.PubMedView ArticleGoogle Scholar
- Marino-Ramirez L, Jordan IK: Transposable element derived DNaseI-hypersensitive sites in the human genome. Biol Direct. 2006, 1: 20-10.1186/1745-6150-1-20.PubMedView ArticleGoogle Scholar
- Marino-Ramirez L, Lewis KC, Landsman D, Jordan IK: Transposable elements donate lineage-specific regulatory sequences to host genomes. Cytogenet Genome Res. 2005, 110 (1–4): 333-341.PubMedGoogle Scholar
- Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003, 31 (1): 374-378. 10.1093/nar/gkg108.PubMedView ArticleGoogle Scholar
- Frith MC, Halees AS, Hansen U, Weng Z: Site2genome: locating short DNA sequences in whole genomes. Bioinformatics. 2004, 20 (9): 1468-1469. 10.1093/bioinformatics/bth094.PubMedView ArticleGoogle Scholar
- RepeatMasker. [http://www.repeatmasker.org/]
- Silva JC, Shabalina SA, Harris DG, Spouge JL, Kondrashovi AS: Conserved fragments of transposable elements in intergenic regions: evidence for widespread recruitment of MIR- and L2-derived sequences within the mouse and human genomes. Genet Res. 2003, 82 (1): 1-18. 10.1017/S0016672303006268.PubMedView ArticleGoogle Scholar
- Gumucio DL, Heilstedt-Williamson H, Gray TA, Tarle SA, Shelton DA, Tagle DA, Slightom JL, Goodman M, Collins FS: Phylogenetic footprinting reveals a nuclear protein which binds to silencer sequences in the human gamma and epsilon globin genes. Mol Cell Biol. 1992, 12 (11): 4919-4929.PubMedView ArticleGoogle Scholar
- Zhang Z, Gerstein M: Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Biol. 2003, 2 (2): 11-10.1186/1475-4924-2-11.PubMedView ArticleGoogle Scholar
- Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D: A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006, 441 (7089): 87-90. 10.1038/nature04696.PubMedView ArticleGoogle Scholar
- Kamal M, Xie X, Lander ES: A large family of ancient repeat elements in the human genome is under strong selection. Proc Natl Acad Sci USA. 2006, 103 (8): 2740-2745. 10.1073/pnas.0511238103.PubMedView ArticleGoogle Scholar
- Lowe CB, Bejerano G, Haussler D: Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc Natl Acad Sci USA. 2007, 104 (19): 8005-8010. 10.1073/pnas.0611223104.PubMedView ArticleGoogle Scholar
- Nishihara H, Smit AF, Okada N: Functional noncoding sequences derived from SINEs in the mammalian genome. Genome research. 2006, 16 (7): 864-874. 10.1101/gr.5255506.PubMedView ArticleGoogle Scholar
- Xie X, Kamal M, Lander ES: A family of conserved noncoding elements derived from an ancient transposable element. Proc Natl Acad Sci USA. 2006, 103 (31): 11659-11664. 10.1073/pnas.0604768103.PubMedView ArticleGoogle Scholar
- Bannert N, Kurth R: Retroelements and the human genome: new perspectives on an old relation. Proc Natl Acad Sci USA. 2004, 101 (Suppl 2): 14572-14579. 10.1073/pnas.0404838101.PubMedView ArticleGoogle Scholar
- Dunn CA, Medstrand P, Mager DL: An endogenous retroviral long terminal repeat is the dominant promoter for human beta1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci USA. 2003, 100 (22): 12841-12846. 10.1073/pnas.2134464100.PubMedView ArticleGoogle Scholar
- Dunn CA, Romanish MT, Gutierrez LE, Lagemaat van de LN, Mager DL: Transcription of two human genes from a bidirectional endogenous retrovirus promoter. Gene. 2006, 366 (2): 335-342. 10.1016/j.gene.2005.09.003.PubMedView ArticleGoogle Scholar
- Romanish MT, Lock WM, Lagemaat van de LN, Dunn CA, Mager DL: Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet. 2007, 3 (1): e10-10.1371/journal.pgen.0030010.PubMedView ArticleGoogle Scholar
- Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D: Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci USA. 2007, 104 (47): 18613-18618. 10.1073/pnas.0703637104.PubMedView ArticleGoogle Scholar
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990, 18 (20): 6097-6100. 10.1093/nar/18.20.6097.PubMedView ArticleGoogle Scholar
- Mann B, Gelos M, Siedow A, Hanski ML, Gratchev A, Ilyas M, Bodmer WF, Moyer MP, Riecken EO, Buhr HJ: Target genes of beta-catenin-T cell-factor/lymphoid-enhancer-factor signaling in human colorectal carcinomas. Proc Natl Acad Sci USA. 1999, 96 (4): 1603-1608. 10.1073/pnas.96.4.1603.PubMedView ArticleGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome research. 2002, 12 (6): 996-1006. 10.1101/gr.229102. Article published online before print in May 2002.PubMedView ArticleGoogle Scholar
- Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED: Aligning multiple genomic sequences with the threaded blockset aligner. Genome research. 2004, 14 (4): 708-715. 10.1101/gr.1933104.PubMedView ArticleGoogle Scholar
- Vertebrate Multiz Alignment & Conservation (17 Species). [http://www.genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=100603286&c=chrX&g=multiz17way]
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research. 2005, 15 (8): 1034-1050. 10.1101/gr.3715005.PubMedView ArticleGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004, 5 (4): 276-287. 10.1038/nrg1315.PubMedView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome research. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.