De novo identification of LTR retrotransposons in eukaryotic genomes
© Rho et al; licensee BioMed Central Ltd. 2007
Received: 21 February 2007
Accepted: 03 April 2007
Published: 03 April 2007
LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements.
In this paper, we report a de novo computational method that can identify new LTR retrotransposons without relying on a library of known elements. Specifically, our method identifies intact LTR retrotransposons by using an approximate string matching technique and protein domain analysis. In addition, it identifies partially deleted or solo LTRs using profile Hidden Markov Models (pHMMs). As a result, this method can de novo identify all types of LTR retrotransposons. We tested this method on the two pairs of eukaryotic genomes, C. elegans vs. C. briggsae and D. melanogaster vs. D. pseudoobscura. LTR retrotransposons in C. elegans and D. melanogaster have been intensively studied using conventional annotation methods. Comparing with previous work, we identified new intact LTR retroelements and new putative families, which may imply that there may still be new retroelements that are left to be discovered even in well-studied organisms. To assess the sensitivity and accuracy of our method, we compared our results with a previously published method, LTR_STRUC, which predominantly identifies full-length LTR retrotransposons. In summary, both methods identified comparable number of intact LTR retroelements. But our method can identify nearly all known elements in C. elegans, while LTR_STRUCT missed about 1/3 of them. Our method also identified more known LTR retroelements than LTR_STRUCT in the D. melanogaster genome. We also identified some LTR retroelements in the other two genomes, C. briggsae and D. pseudoobscura, which have not been completely finished. In contrast, the conventional method failed to identify those elements. Finally, the phylogenetic and chromosomal distributions of the identified elements are discussed.
We report a novel method for de novo identification of LTR retrotransposons in eukaryotic genomes with favorable performance over the existing methods.
Mobile genetic elements (MGEs, also called transposable elements, TEs), which can transpose from one location to another within the genome, are known to be one of the causes of large scale genome reorganization . According to the mechanism of their transposition, MGEs are usually classified into two broad categories: retroelements (or class I elements), which are transposed through the reverse transcription of an RNA template (retrotransposition), and DNA transposons (or class II elements), which are transposed through a classical DNA "cut-and-paste" transposition model. MGEs have attracted the attention of evolutionary biologists in studying their interactions with the host species , especially in the post-genome era when more and more eukaryotic genomes are sequenced. The conventional approach to annotating MGEs in genomic sequences is based upon homology searching against a well-updated library of known MGEs, e.g. Repbase , using a fast searching program, e.g. RepeatMasker . This approach, however, is limited to annotating those known MGE families, and thus cannot identify new elements. Furthermore, it sometimes even overlooks known elements, because the repetitive nature of MGE elements may confuse the statistical methods (e.g. E-values) that are commonly used in genome annotation .
In a pioneer paper, Bao and Eddy described a de novo approach to automated annotation of repeat elements in a genome . Their program RECON clustered BLAST hits from self-comparison of a single genome and reported the repeat elements that appear many times in similar copies in the genome. Since then, several software tools have been developed with improved speed and performance over RECON, e.g. RepeatScout , PILER , and a combined method . All these methods described above, however, attempted to identify repeat elements based on their copy numbers in a genome, thus facilitating identification of general repeat elements. Many MGEs indeed appear high copies in the host genome because of their transposition activity. But some MGE families have low copy numbers in some genomes. Furthermore, there exist other types of repeat elements than MGEs. For example, many low copy repeats (LCRs) in mammalian genomes are induced by segmental duplications . Although these LCRs follow a completely different duplication mechanism from MGEs, there is often no clear distinction in copy numbers between these two classes of repeats. As a result, successful identification of new MGEs by these bioinformatics approaches requires subsequent manual inspection and experimental validation . Recently, a new computational method was proposed that identified genome-specifically inserted sequences using multiple alignment of closely related genomes . This new method does not rely on the copy number of the repeat elements to identify them, but does not attempt to distinguish different classes of repeats either.
In this paper, we adopt a different de novo approach to identifying mobile genetic elements, which is based on common structural models of specific MGE families, rather than their copy numbers in a genome. As an initial step of this approach, we concentrate on one class of mobile genetic elements, LTR-retrotransposons, which share a unique structural feature, two long terminal repeats (LTRs) that are longer than 100 bp and play a key role in their transposition. LTR retrotransposons and endogenous retroviruses have partially overlapping gene organizations, and thus are thought to have the same origin. Since two LTRs of a single LTR retrotransposon have identical sequences at the time of integration, dating the transposition event of a LTR retrotransposon can be achieved reliably by computing the sequence similarity of its two LTRs . Therefore, LTR retrotransposons become an ideal subject for phylogenetic analysis. Computational screening of LTR retrotransposons has been done extensively in several eukaryotic genomes, e.g. C. elegans , D. melanogaster [15, 16], mouse  and rice . Software tools, such as LTR_STRUC , and a newly developed one  were developed to speed up the screening process. However, they were based on sequence characteristics derived from known LTR retroelements. Because of the high divergence of LTR retrotransposons , there are likely new elements still to be identified, even in these well-studied model organisms.
We propose here a de novo computational method for LTR retrotransposon identification that consists of three steps. In the first step, we identify only young and intact LTR retrotransposons, i.e. those elements associated with pairs of LTRs with high identity (e.g. > 80%). This problem can be formulated as finding two highly similar subsequences with a distance ranging typically from 1000 to 20000 bases in a given genomic sequence. We used an approximate string matching technique, based on the suffix array data structure, to solve this problem. In addition, the structure of retroelements is inspected by the occurrences of common protein domains. In the second step, we identify solo LTRs, i.e. the unpaired LTRs resulting from recombination between LTR retrotransposons, by first applying the BAG sequence clustering algorithm  to cluster LTRs identified in the previous step, and then searching against the whole genome using sequence profile Hidden Markov Models (pHMMs) built from these LTR sequence clusters. Finally, we identify old and intact LTR retrotransposons with LTR pairs of low identities (e.g. < 80%) by a phylogenetic analysis of identified LTR elements.
We implemented our method in a software package using C++ and perl, and tested it on two eukaryotic genomes, C. elegans and D. melanogaster. We chose these genomes for initial testing because they have been well studied so that we can compare our results with the previous known ones and those identified by LTR_STRUC . It turns out that our de novo method identified almost all of the previously known elements, whereas LTR_STRUC missed about 1/3 of them, although both methods report comparable number of retroelements. This indicates our method has a higher sensitivity over the existing method. In addition to known elements, our method identified some new intact LTR retrotransposons and several putative new families of LTR retrotransposons. These are particularly encouraging results, for these two genomes have been well studied. In order to obtain a larger evolutionary picture of their transpositions, we also analyzed two additional genomes, C. briggsae and D. pseudoobscura, each closely related to one of the two model genomes. From the phylogenetic analysis of the identified elements, we find clear evidence that some LTR retrotransposon families are specific to single species within a genus, whereas some others are active across both genomes. We also analyzed the distribution of chromosomal locations of identified LTR retrotransposons. Consistent with previous reports, we observed that there were more LTR retrotransposons existing in heterochromatic regions than in euchromatic regions, implying that active mobile genetic elements might contribute significantly to the formation of heterochromatin.
Results and discussion
Identification of intact and solo LTR retroelements
Number of Clusters, Intact LTRs and Solo LTRs in the genomes of C. elegans, C. briggsae, D. melanogaster, and D. pseudoobscura
Comparison with existing methods
LTR retrotransposons in the C. elegans genome
List of Elements in the C. elegans genome
Family notation from previous work
# of Intact LTRs (from previous work)
Avg. Identity between LTRs (%)
# of Solo LTRs (from previous work)
Table 2 summarizes all clusters, including 22 new clusters, identified in this work. We note that these findings need to be validated by additional inspection. In addition to the new clusters, we also identified several new elements in some previously classified families . For example, in our results, cluster LTR_CE8 is a mixture of two previously identified families (Cer 8 and Cer 9). Previous study identified, in total, 5 retroelements (2 and 3 respectively) in these two families, whereas our method identified 8 retroelements. One of the three new retroelements in this cluster (element 5) was identified in step 1 and the other two (element 6 and element 8) were identified in step 3. The similarities between the new elements and previously known elements in this cluster were significant, e.g. 45.6% (element 1 vs. 5), 44.3% (element 1 vs. 6) and 53.0% (element 2 vs. 8). We stress that all these new elements are not identified by LTR_STRUC in our test, indicating these missed elements are probably not caused by different versions of the C. elegans genome used in this study than in the previous one.
LTR retrotransposons in the C. briggsae vs. C. elegans genomes
We note that the number of elements identified in the C. briggsae genome is significantly less than in the C. elegans genome, even though these two genomes have similar genome size and gene content . We hypothesize that this difference may be due to the fact that the C. briggsae genome is not fully finished. We used the following simulation experiment to test this hypothesis. We randomly shredded the sequence of the C. elegans genome into the same number (577) of scaffolds with identical lengths to those of the C. briggsae genome. We repeated this procedure 100 times and each time we determined how many identified intact elements in the C elegans genome were broken. We found that on average 31 (out of 58) intact elements were retained (with standard deviation about 9), which is comparable to the number of intact elements identified in the C. briggsae genome (24). Hence, we concluded that the C. briggsae genome may not contain significantly fewer intact LTR retroelements than the C. elegans genome, and that many elements may still be missing from the current analysis because of the incompleteness of its genomic sequence.
LTR retrotransposons in D. melanogaster genome
List of Elements in the D. melanogaster genome
Family notation from previous work
# of Intact LTRs (from previous work)
Avg. Identity between LTRs (%)
# of Solo LTRs
4 (n/a, 1)
16 (6, 7)
38 (10, 18)
28 (18, 24)
4 (n/a, 4)
16 (6, 13)
23 (21, 22)
13 (1, 7)
30 (n/a, 26)
8 (n/a, 1)
17 (8, 9)
4 (n/a, 2)
19 (4, 13)
13 (8, 8)
3 (1, 2)
16 (4, 16)
98 (40, 58)
17 (n/a, 5)
25 (n/a, 3)
21 (6, 15)
5 (4, 5)
3 (n/a, 0)
LTR retrotransposons in the D. pseudoobscura vs. D. melanogaster genomes
A total of 43 pairs of intact LTR retrotransposons were identified after step 1 in the D. pseudoobscura genome. The LTRs thus obtained were clustered into 41 clusters, from which 983 solo LTRs were found. After phylogenetic analysis, 5 additional old intact LTR retroelements were identified from 10 (out of 983) solo LTRs. In summary, we identified 48 (= 43+5) intact retrotransposons and 973 (= 983-10) solo LTRs (see Additional file 2). We identified far fewer LTR retroelements in the D. pseudoobscura genome than in the D. melanogaster genome. This is also understandable since the D. pseudoobscura genome is not as well finished as the D. melanogaster genome. In particular, almost no heterochromatic DNA has been sequenced in this genome. In contrast, there is a well-progressed finishing effort for the D. melanogaster genome, particularly in heterochromatic regions. As we show below, in the D. melanogaster genome, a major fraction of the LTR retroelements were identified in heterochromatic regions. Nevertheless, we can still identify many putative LTR retroelements in euchromatic regions of the D. pseudoobscura genome. For example, in cluster LTR_DP30, two intact LTR retroelements were identified in step 1. The identities between the pairs of LTRs are 96.3% and 98.0%, respectively. The identity between these two elements is 62.6%.
Chromosomal distribution of LTR retroelements
The analysis of chromosomal distributions of the identified LTR retroelements was performed on the C. elegans and D. melanogaster genomes. The Kolmogorov-Smirnov test was used to determine whether the LTRs are distributed uniformly in terms of their chromosomal location. With the significance level of 0.05, the hypothesis of a uniform distribution was clearly rejected in chromosomes I, II, V, and X of the C. elegans genome (see Additional file 3); whereas the significance level on chromosomes III and IV were p = 0.4586 and p = 0.1420, respectively. These results were consistent with the previous observations on the same genome  and the DNA replication model for the C. elegans genome .
Phylogenetic analysis of RT domains
We proposed a novel computational method for de novo identification of LTR retrotransposons in eukaryotic genomes. It has been applied to several complete eukaryotic genomes and identified many new putative intact LTR retroelements, among which a few new potential families were discovered.
The genomic sequences of C. elegans, C. briggsae, D. melanogaster, and D. pseudoobscura were obtained from public domains. The complete genomic sequence of C. elegans (WS120) and a draft genomic sequence of C. briggsae (cb25.agp8) were downloaded from Wormbase at the Sanger Institute. The complete genomic sequence of D. melanogaster (Release 4.0) was downloaded from the website of the Berkeley Drosophila Genome Project . The draft genomic sequence of D. pseudoobscura (Release 1.0) was downloaded from FlyBase .
De novo identification of young intact LTR retroelements
Identification of Solo LTRs
Solo LTRs are created by recombination between two intact LTRs during evolution. In order to identify solo LTRs, LTRs from intact retroelements identified in the previous step were first clustered based on their sequence similarity, using a BAG clustering algorithm . The BAG clustering algorithm represented all LTRs from intact retroelements by an undirected graph, in which each node represented a LTR sequence and a weighted edge between two LTR were created if the sequence similarity between corresponding LTRs was above a preset cutoff threshold. Smith-Waterman alignment score from FASTA comparison of two LTRs was used as a similarity measure. BAG generated clusters of LTRs by iteratively splitting a graph into biconnected components with an increased cutoff score at each iteration while forcing two LTRs from the same intact element to be grouped into the same cluster. Next, for each cluster of LTR sequences, we aligned them using CLUSTALW and the resulting multiple alignment was used to generate a profile HMM using HMMBuild from the HMMER package. Finally, HMMSearch from the same package was used to search for HMMs from all the LTR clusters against the entire genome to identify potential LTRs, including solos. The threshold of E-value for the search was set up as 1.0e-9, which was determined based on the best recovery of known solo LTRs.
Identification of old intact LTR retroelements
In the previous sections, we have shown how we identified young LTR retroelements that contain highly similar pairs of LTRs. However, this approach may miss those relatively old LTR retrotransposons that contain two LTRs that are no longer highly similar to each other. To address this issue, a phylogenetic analysis was carried out. We built a phylogenetic tree for all solo LTRs in the same cluster. Some LTRs among them may not be true solos; instead they may be located within certain distance ranges and the sequence between them may actually be an intact retroelement. The reason why they are classified as "solo LTRs" is simply because they are not highly similar to each other to be identified based on the criteria used in the first step. We classified a pair of "solo" LTRs into a single (old) intact retroelement, if they are (1) located within certain distance range in the genome; and (2) closest neighbors in the phylogenetic tree (Figure 1).
We implemented the method described above in a software package, using C++ and Perl. The source code of the major part of program can be downloaded from the supplementary website . The typical running time for analyzing a eukaryotic genome ranges from several hours to tens of hours.
Throughout the paper, all phylogenetic analysis was done in two steps. The sequences were first aligned using CLUSTALW  and then the neighbor-joining tree was built using PHYLIP  with 1000 bootstraps.
Analysis of the distribution of the genomic locations of LTR retroelements
The chromosomal distribution of LTRs was analyzed by a Kolmogorov-Smirnov (KS) test. The null hypothesis based on a uniform distribution was used to determine whether the chromosomal distribution of LTRs is random. The pre-defined function for KS test in MATLAB was used for this purpose. For further analyses, the chromosomal distribution of the ratios between the number of intact LTR retroelements and solo LTRs in the D. melanogaster genome was computed and plotted along coordinate bins of chromosomes 2, 3, and X.
We thank Drs. Eric Ganko and John McDonald for providing the LTR_STRUC program and the sequences of Cer elements, and Dr. Justen Andrews for helpful discussions. This work is supported by MetaCyt Initiative at Indiana University, funded by Lilly Endowment, Inc.
- Kidwell MG, Lisch D: Transposable elements as sources of variation in animals and plants. Proc Natl Acad Sci U S A. 1997, 94 (15): 7704-7711. 10.1073/pnas.94.15.7704.PubMed CentralPubMedView ArticleGoogle Scholar
- Brookfield JF: The ecology of the genome - mobile DNA elements and their hosts. Nat Rev Genet. 2005, 6: 128-136. 10.1038/nrg1524.PubMedView ArticleGoogle Scholar
- Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 16 (9): 418-420. 10.1016/S0168-9525(00)02093-X.PubMedView ArticleGoogle Scholar
- Smit A: RepeatMasker. unpublished, [http://www.genome.washington.edu/uwgc/analysistools/repeatmask.htm]
- Holmes I: Transcendent elements: whole-genome transposon screens and open evolutionary questions. Genome Res. 2002, 12 (8): 1152-1155. 10.1101/gr.453102.PubMedView ArticleGoogle Scholar
- Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12 (8): 1269-1276. 10.1101/gr.88502.PubMed CentralPubMedView ArticleGoogle Scholar
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-i358. 10.1093/bioinformatics/bti1018.PubMedView ArticleGoogle Scholar
- Edgar RC, Myers EW: PILER: identification and classification of genomic repeats. Bioinformatics. 2005, 21 (Suppl 1): i152-i158. 10.1093/bioinformatics/bti1003.PubMedView ArticleGoogle Scholar
- Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D: Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol. 2005, 1 (2): 166-175. 10.1371/journal.pcbi.0010022.PubMedView ArticleGoogle Scholar
- Bailey JA, Eichler EE: Genome-wide detection and analysis of recent segmental duplications within mammalian organisms. Cold Spring Harb Symp Quant Biol. 2003, 68: 115-124. 10.1101/sqb.2003.68.115.PubMedView ArticleGoogle Scholar
- Jiang N, Bao Z, Zhang X, Hirochika H, Eddy SR, McCouch SR, Wessler SR: An active DNA transposon family in rice. Nature. 2003, 421 (6919): 163-167. 10.1038/nature01214.PubMedView ArticleGoogle Scholar
- Caspi A, Pachter L: Identification of transposable elements using multiple alignments of related genomes. Genome Research. 2006, 16: 260–270-PubMed CentralPubMedGoogle Scholar
- SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The paleontology of intergene retrotransposons of maize. Nat Genet. 1998, 20 (43-45):
- Ganko EW, Fielman KT, McDonald JF: Evolutionary History of Cer Elements and Their Impact on the C. elegans Genome. Genome Research. 2001, 11: 2066–2074-10.1101/gr.196201.PubMed CentralPubMedView ArticleGoogle Scholar
- Kaminker JS, Bergman CM, Kronmiller B, Carlson J, Svirskas R, Patel S, Frise E, Wheeler DA, Lewis SE, Rubin GM, Ashburner M, Celniker SE: The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biology. 2002, 3: research0084.1–0084.20-10.1186/gb-2002-3-12-research0084.View ArticleGoogle Scholar
- Lerat E, Rizzon C, Biémont C: Sequence Divergence Within Transposable Element Families in the Drosophila melanogaster Genome. Genome Research. 2003, 13: 1889–1896-PubMed CentralPubMedGoogle Scholar
- McCarthy EM, McDonald JF: Long terminal repeat retrotransposons of Mus musculus. Genome Biol. 2004, 5 (3): R14-10.1186/gb-2004-5-3-r14.PubMed CentralPubMedView ArticleGoogle Scholar
- Ma J, Devos KM, Bennetzen JL: Analyses of LTR-Retrotransposon Structures Reveal Recent and Rapid Genomic DNA Loss in Rice. Genome Research. 2004, 14: 860–869-10.1101/gr.1466204.PubMed CentralPubMedView ArticleGoogle Scholar
- McCarthy EM, McDonald JF: LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003, 19 (3): 362-367. 10.1093/bioinformatics/btf878.PubMedView ArticleGoogle Scholar
- Kalyanaraman A, Aluru S: Efficient algorithms and software for detection of full-length LTR retrotransposons. J Bioinform Comput Biol. 2006, 4 (2): 197-216. 10.1142/S021972000600203X.PubMedView ArticleGoogle Scholar
- Havecker ER, Gao X, Voytas DF: The diversity of LTR retrotransposons. Genome Biol. 2004, 5 (6): 225-10.1186/gb-2004-5-6-225.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim S, Lee J: A Graph Theoretic Sequence Clustering Algorithm. Int J Data Mining Bioinformatics. 2006, 1 (2): 178-200. 10.1504/IJDMB.2006.010855.View ArticleGoogle Scholar
- Kalyanaraman A, Aluru S: Efficient Algorithms and Software for Detection of Fell-Length LTR Retrotransposons: Stanford University.Edited by: Markstein P, Xu Y. 2005, World Scientific press,Google Scholar
- Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DH, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R, Waterston RH: The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 2003, 1 (2): E45-10.1371/journal.pbio.0000045.PubMed CentralPubMedView ArticleGoogle Scholar
- Bowen NJ, McDonald JF: Drosophila Euchromatic LTR Retrotransposons are Much Younger Than the Host Species in Which They Reside. Genome Research. 2001, 11: 1527-1540. 10.1101/gr.164201.PubMed CentralPubMedView ArticleGoogle Scholar
- Kapitonov VV, Jurka J: Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc Natl Acad Sci USA. 2003, 100: 6569–6574-10.1073/pnas.0732024100.PubMed CentralPubMedView ArticleGoogle Scholar
- Rizzon C, Marais G, Gouy M, Biémont C: Recombination Rate and the Distribution of Transposable Elements in the Drosophila melanogaster Genome. Genome Research. 2002, 12: 400–407-10.1101/gr.210802. Article published online before print in February 2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Vieira C, Lepetit D, Dumont S, Biémont C: Wake Up of Transposable Elements Following Drosophila simulans Worldwide Colonization. Mol Biol Evol. 1999, 16: 1251–1255-PubMedGoogle Scholar
- Riddle DL, Blumenthal T, Meyse BJ, Priess JR: C. elegans II. 1997, Cold Spring Harbor Laboratory PressGoogle Scholar
- Flybase: [ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Dpseudoobscura]
- Kasai T, Lee G, Arimura H, Arikawa S, Park K: Linear-time longest common-prefix computation in suffix arrays and its applications.: Jerusalem, Israel. 2002, Springer-Verlag, 2089: 181-192.
- Choi JH, Cho HG, Kim S: Alignment method for microbial whole Genomes using maximal exact match filtering . Computational Biology and Chemistry. 2005, 29 (3): 244-253. 10.1016/j.compbiolchem.2005.04.004.PubMedView ArticleGoogle Scholar
- Manber U, Myers G: Suffix arrays: a new method for on-line string searches. SIAM J Comput. 1993, 22 (5): 935-948. 10.1137/0222058.View ArticleGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32 (Database issue): D138-D141 . 10.1093/nar/gkh121.PubMed CentralPubMedView ArticleGoogle Scholar
- HMMer: [http://hmmer.wustl.edu]
- Ramu C, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31 (13): 3497-3500. 10.1093/nar/gkg546.View ArticleGoogle Scholar
- Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.