Transcriptome sequencing and phylogenomic resolution within Spalacidae (Rodentia)
© Lin et al.; licensee BioMed Central Ltd. 2014
Received: 26 August 2013
Accepted: 14 January 2014
Published: 17 January 2014
Subterranean mammals have been of great interest for evolutionary biologists because of their highly specialized traits for the life underground. Owing to the convergence of morphological traits and the incongruence of molecular evidence, the phylogenetic relationships among three subfamilies Myospalacinae (zokors), Spalacinae (blind mole rats) and Rhizomyinae (bamboo rats) within the family Spalacidae remain unresolved. Here, we performed de novo transcriptome sequencing of four RNA-seq libraries prepared from brain and liver tissues of a plateau zokor (Eospalax baileyi) and a hoary bamboo rat (Rhizomys pruinosus), and analyzed the transcriptome sequences alongside a published transcriptome of the Middle East blind mole rat (Spalax galili). We characterize the transcriptome assemblies of the two spalacids, and recover the phylogeny of the three subfamilies using a phylogenomic approach.
Approximately 50.3 million clean reads from the zokor and 140.8 million clean reads from the bamboo ratwere generated by Illumina paired-end RNA-seq technology. All clean reads were assembled into 138,872 (the zokor) and 157,167 (the bamboo rat) unigenes, which were annotated by the public databases: the Swiss-prot, Trembl, NCBI non-redundant protein (NR), NCBI nucleotide sequence (NT), Gene Ontology (GO), Cluster of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG). A total of 5,116 nuclear orthologous genes were identified in the three spalacids and mouse, which was used as an outgroup. Phylogenetic analysis revealed a sister group relationship between the zokor and the bamboo rat, which is supported by the majority of gene trees inferred from individual orthologous genes, suggesting subfamily Myospalacinae is more closely related to subfamily Rhizomyinae. The same topology was recovered from concatenated sequences of 5,116 nuclear genes, fourfold degenerate sites of the 5,116 nuclear genes and concatenated sequences of 13 protein coding mitochondrial genes.
This is the first report of transcriptome sequencing in zokors and bamboo rats, representing a valuable resource for future studies of comparative genomics in subterranean mammals. Phylogenomic analysis provides a conclusive resolution of interrelationships of the three subfamilies within the family Spalacidae, and highlights the power of phylogenomic approach to dissect the evolutionary history of rapid radiations in the tree of life.
KeywordsSpalacidae Phylogenomics Transcriptome Mitochondrial genome Subterranean rodents
Subterranean mammals have received extensive attention for their specialized adaptations to life underground from scientists in different fields . Three orders of mammals include subterranean species: the rodents, the insectivores, and the marsupials, which independently display adaptive convergence and divergence, are good subjects for comparative studies in morphology, ecology, physiology, ethology, and most recently, genetics [1, 2].
All extant species of zokors (Myospalacinae, ~9 species), blind mole rats (Spalacinae, ~13 species), and bamboo rats (Rhizomyinae, ~17 species, including Rhizomys, Cannomys and Tachyoryctes) are typical subterranean mammals [3–6]. Because of their cryptic lifestyle and the morphological convergence associated with the subterranean life in tubular burrows , their systematic placement has been a long-standing puzzle . Members of these subterranean rodents had been assigned to Muridae (mice and rats), Cricetidae (New World rats and mice, voles, hamsters, and relatives), and Rhizomyidae (bamboo rats and East African mole rats) . Recently, overwhelming molecular evidence has suggested the cladistic reunion of zokors, bamboo rats and blind mole rats, and the monophyletic origin of the three groups that form the family Spalacidae has been generally accepted after a century of debate [4, 7–13].
In this study, we employed the Illumina Hiseq 2000 platform and sequenced the transcriptomes of brain and liver of a plateau zokor (Eospalax baileyi) and a hoary bamboo rat (Rhizomys pruinosus). We compiled newly obtained transcriptome data from the zokor and the bamboo rat alongside the published transcriptome assembly from the Middle East mole rat , and provided a conclusive resolution of interrelationships of the three subfamilies within the family Spalacidae using a phylogenomic approach.
Illumina sequencing and assembly
Details of the transcriptome sequences generated in this study (bp, base pair; ORF, Open Reading Frame; N50, N50 statistics)
Mean length (bp)
Mean length (bp)
Functional annotation and classification
Summary of unigene annotations of the two transcriptomes (Values in parentheses are percentages of all assembled unigenes in a given species)
Orthologous gene identification and multiple sequence alignment
The INPARANOID program was employed to identify orthologs based on two-way best genome-wide pairwise matches for each species pair . The automatic clustering method implemented in the program can distinguish between orthologs and paralogs without using multiple alignments and phylogenetic trees [16, 17]. All possible pairwise orthologs identified from the INPARANOID program were introduced into MULTIPARANID program to generate multiple-species orthologs . All examined species include the plateau zokor, the hoary bamboo rat, the Middle East blind mole rat, and mouse which was used as an outgroup. Multiple sequence alignments of each ortholog were performed with the MACSE  and the PRANK  programs. Each alignment was check manually in MEGA 5 , alignments with poor quality or aligned region <100 bp were discarded. The resulting multiple sequence alignments include: (1) 5,116 individual alignments for each nuclear gene (Additional file 4), (2) 13 individual alignments for each protein-coding mitochondrial gene, (3) one alignment (5,541,534 bp in length) for the concatenated 5116 nuclear genes, (4) one alignment (715,762 bp in length, 12.92% of total length of the 5116 nuclear genes) for the concatenated fourfold degenerated sites (4D-sites) of the 5,116 nuclear genes, (5) one alignment (11,307 bp in length) for the concatenated 13 mitochondrial genes, and (6) one alignment (1,182 bp, 10.45% of total length of the 13 mitochondrial genes) for the concatenated 4D-sites of the 13 mitochondrial genes. Note that 4D-sites refer to the four-fold degenerate third codon positions of protein-coding genes at which any of the four nucleotide substitutions cannot result in an amino acid replacement, thus 4D-sites are a subset of synonymous sites that are supposed to be under no or very weak selection .
Phylogenetic tree reconstruction
To avoid the confounding effects of sequence saturation in phylogenetic analysis, we assessed the substitution saturation with DAMBE5 . Results showed that all the 500 randomly selected nuclear genes, the concatenated 4D-sites of the 5,116 nuclear genes and the concatenated 13 protein-coding mitochondrial genes obviously have experienced little substitution saturation (two tailed P value < 0.001). In contrast, the concatenated 4D-sites of 13 mitochondrial genes showed substantial substitution saturation (two tailed P value >0.05) and were excluded for further analysis.
We used both Maximum Likelihood (ML) and Bayesian Inference (BI) approaches for phylogenetic tree reconstruction. jModelTest2 program  was used to select the best-fitting substitution model for each alignment according to the Akaike information criterion and Bayesian information criterion . For the 5,116 individual nuclear genes, totaling 70 models were selected as the best-fitting models for ML approach, the top five selected models were TrN + G, TrN + I, HKY + I, TIM3 + G, and TIM3 + I, covering 34.0% of the 5,116 cases. Totaling 13 models were chosen as the best-fitting models for Bayesian Inference (BI), the top five selected models were HKY + I, GTR + G, GTR + I, HKY + G, and HKY, accounting for 92.6% of the 5,116 cases (Additional file 5). For the 13 individual mitochondrial genes, the TIM2 models (TIM2 + I, TIM2 + G and TIM2 + I + G) were most common (8/13) for ML analysis, while the GTR models (GTR + I and GTR + G) were most popular (11/13) for BI approach (Additional file 5). In addition, the best-fitting models of the concatenated 4D-sites of the 5,116 nuclear genes were GTR + I + G for both ML and BI approaches.
Numbers of individual genes that support different phylogenetic tree topologies inferred from the Maximum Likelihood (ML) and Bayesian Inference (BI) approaches (only genes that support a tree with bootstrap values >50% and >70% were counted)
Bootstrap values >50%
((Zokor, bamboo rat), blind mole rat)
((Zokor, blind mole rat), bamboo rat)
((Bamboo rat, blind mole rat), zokor)
Bootstrap values >70%
((Zokor, bamboo rat), blind mole rat)
((Zokor, blind mole rat), bamboo rat)
((Bamboo rat, blind mole rat), zokor)
When the genome sequence is unavailable, transcriptome sequencing is an effective way to obtain large numbers of transcripts. In this study, we present the first transcriptome data in zokors and bamboo rats using massively parallel mRNA sequencing. We perform Illumina sequencing of the brain which is the most complex organ in the body and the liver which plays a major role in metabolism. Although the raw data of the bamboo rat were much larger than that of the zokor, the results of sequence assembly, annotation, and classification of the two datasets were comparable. Furthermore, we provide a conclusive resolution of phylogenetic placement of the three subfamilies in the family Spalacidae. Phylogenetic analyses using various data coding schemes and analytical approaches overwhelmingly support that the subfamily Myospalacinae is more closely related to the subfamily Rhizomyinae than to the subfamily Spalacinae. Although we did not examine multiple individuals of each species, the genetic variations within species should not affect our phylogenetic resolution at subfamily levels.
Traditional sanger sequencing technology produces longer EST (expressed sequence tag) sequences with low throughput, which becomes little advantages when new assembly programs can handle large throughput short sequences [26, 27]. Further, traditional transcriptome studies generally use a cDNA library of one tissue, while transcriptome sequencing based on NGS methods can work on a normalized cDNA library comprising multiple tissues and individuals, which would considerably reduce the sequencing cost. Hence, the cost- and time-effective NGS technologies will provide more comprehensive transcriptome information and facilitate the transcriptome studies, particularly in less-studied species. Although the large-scale transcriptome sequencing was used to sequence cDNA pools derived from Spalax galili, a member of the subfamily Spalacinae, the transcriptome surveys on the members of the other two subfamilies within the family Spalacidae are virtually lacking. Our work provides a valuable genomic resource and will stimulate the further analysis of these taxa with exceptional scientific interest. For example, subterranean rodents are believed to have greater hypoxia tolerance than their aboveground relatives . Upon examining 247 candidate genes involving in adaptation to high-altitude hypoxia , we identified 195 (79%) and 207 (84%) genes in the zokor and bamboo rat transcriptomes, respectively. This finding will facilitate future comparative analysis of these functional genes.
Before the emergence of molecular identification approaches, one determines taxonomic status and phylogenetic relationships of animals mainly rely on morphological characteristics. However, the evolutionary change of morphological characters is extremely complicated (even for a short evolutionary time), the phylogenetic trees derived from morphological data frequently remain controversial . In contrast, since the evolutionary change of DNA follows a traceable pattern, it is possible to use a mathematical model to formulate the change and compare DNA sequences among different organisms . As a result, molecular phylogenetics is expected to clarify the tree of life that has been difficult to be resolved by the classical approaches. Taking zokors as an example, these animals have been allied to several different muroid subfamilies including Rhizomyinae, Spalacinae, Arvicolinae, and Cricetinae based on morphological characteristics , while overwhelming studies using molecular phylogenetic approaches concordantly placed zokors into Spalacidae [5, 9, 12, 14]. It should be noticed that random noise will generally lead to poorly resolved phylogenetic trees, primarily because the number of nucleotide substitutions of a few genes is small. This may be the major reason why the previous molecular phylogeny studies showed inconsistence of topology relationships among the three subfamilies in Spalacidae [5, 8, 12–14].
In this study, we used a large number of genes including 5,116 nuclear genes and all 13 mitochondrial protein coding genes to examine the phylogenetic affinities of the three subfamilies within Spalacidae. Our analyses based on various data sets overwhelmingly support that the Spalacinae had a highest probability to be a basal clade relative to others within Spalacidae, while Rhizomyinae and Myospalacinae form a sister group. However, the most important challenges of phylogenomics involve different tree reconstruction methods and substitution saturation that can influence the validity of phylogenetic placements [30, 31]. To overcome the challenges, we used both ML and BI approaches based on both nuclear and mitochondrial genes, and found no incongruence between the two tree reconstruction approaches (Figure 2). Further, we performed saturation tests and excluded the saturated 4D-site concatenation of mitochondrial genes in our analyses. Interestingly, our proposed phylogeny of the three subfamilies within Spalacidae is accordant with that inferred from massive morphological analyses on fossils of Spalacidae . The known Early Miocene rhizomyines are closer to the stem zokor morphotype, suggesting that Myospalacinae is more closely related to Rhizomyinae than to Spalacinae. Although we didn’t examine African mole rats (Tachyoryctes) that were previously considered to be a separate subfamily in Spalacidae, multiple lines of molecular evidence consistently supported a sister relationship between Rhizomys and Tachyoryctes, suggesting that Tachyoryctes should be incorporated into the Rhizomyinae [3–6]. Together, our phylogenomic analyses unambiguously resolve the phylogeny of the three subfamilies comprising the family Spalacidae.
Geographically, members of the subfamily Spalacinae are found in East Europe, West Asia, Near East, and North Africa, members of Myospalacinae inhabit Siberia and Northern China, while the Rhizomyinae include two geographically distant relatives (bamboo rats and East African mole rats) that are from Southeastern Asia and East Africa . Our molecular phylogenetic evidence suggests that the common ancestor of Spalacidae firstly split into Spalacinae and Rhizomyinae + Myospalacinae clades in the Northern Asia adjacent to the Middle East, and the latter clade subsequently split into Rhizomyinae and Myospalacinae in Mongolia and Northern China . Interestingly, although Rhizomyinae and Myospalacinae are more phylogenetically related, Myospalacinae appears to be more biologically similar with Spalacinae than with Rhizomyinae for certain traits. For example, Myospalacinae and Spalacinae are totally blinded, while Rhizomyinae retain their sight . The resolution of phylogenetic relationships among the three subfamilies will provide a framework for evolutionary studies on convergence and divergence of these animals.
To summarize, this work is the first report of transcriptome sequencing in zokors and bamboo rats, representing a valuable resource for future studies of comparative genomics in subterranean mammals. Phylogenomic analysis provides a conclusive resolution of interrelationships of the three subfamilies within the family Spalacidae, and highlights the power of phylogenomic approach to dissect the evolutionary history of rapid radiations in the tree of life.
All experimental protocols were approved by the Animal Care and Use Committee of Northwest Institute of Plateau Biology, Chinese Academy of Sciences.
Taxon sampling and RNA sequencing
One adult male plateau zokor was caught by a live trapping arrow from Datong County (N 37°7.5′, E 101°48.7′), Qinghai Province, China. One male adult hoary bamboo rat was bought from a local farmer in Kunming (N 25°2.2′, E 102°42.5′), Yunnan Province, China. The animals were humanely sacrificed to collect the brain and liver tissues. The fresh tissues were frozen in liquid nitrogen immediately after collection, and stored at −80°C refrigerator before use. Total RNA was isolated from the brain and liver using Trizol (Invitrogen, CA, USA) following the manufacturer’s protocols.
Illumina sequencing was performed commercially following manufacturer’s instructions. Briefly, magnetic beads with oligo(dT) were used to purify poly(A) mRNA from the total RNA. Subsequently, the mRNA was fragmented into small pieces (200–500 bp) at 94°C for exactly 5 minutes. The cleaved RNA fragments were reverse transcribed into first-strand cDNA using SuperScript II reverse transcriptase and random primers, the second-strand cDNA was generated with GEX second strand buffer, DNA polymerase, RNase H and dNTPs. These cDNA fragments were further proceeded with end repair and 3’ adenylated. Paired-end adapters were ligated to the 3′ adenylated cDNA fragments. cDNA fragments of ~200 bp were selected and enriched by 15 cycles of PCR amplification with Phusion DNA Polymerase. Finally, four cDNA libraries were constructed and sequenced bi-directionally (100 bp each direction) on an Illumina Genome Analyzer (HighSeq2000, Illumina, San Diego, CA).
De novo assembly and unigene annotation
The reads of brain and liver of each species were merged and after removal of the low quality reads, the clean reads were assembled with the Trinity program . The generated unigenes were further filtered and clustered using CD-HIT-EST program . The filtered unigenes were then used for BLAST search and annotation against the NCBI NR/NT database, the Swiss-prot/Trembl database using an E-value cut-off of 1E-5. The XML files generated from NR blast results were loaded into the Blast2GO software for GO (gene ontology) annotation. The ORFs were extracted by the Perl script transcripts_to_best_scoring_ORFs.pl (TransDecoder) in the Trinity program package. The ORFs were then submitted to the KAAS (KEGG Automatic Annotation Server, http://www.genome.jp/tools/kaas/) for KEGG pathways annotation.
The unigenes of the blind mole rat unigenes were downloaded from GenBank under the accession number provided by. The predicted peptide sequences (ORFs) of this species were extracted by the Perl script Transdecoder in the Trinity program package. The mouse (used as outgroup in the tree reconstruction) cDNA and protein sequences were downloaded from Ensembl (http://www.ensembl.org). Because of the differences of genetic coding between mitochondrial DNA and nucleotide DNA, the mitochondrial DNA in the transcriptomes will be excluded by ORF finding and will not confuse the mitochondrial DNA extracted directly from the mitochondrial genome. The complete mitochondrial genomes of these four species were downloaded from GenBank with accession numbers JN540033.1 (plateau zokor, E. baileyi), AJ416891.1 (blind mole rat, S. ehrenbergi), KC789518 (hoary bamboo rat, R. pruinosus), and AY172335.1 (mouse, Mus musculus). The 13 protein coding sequences of mitochondrial genomes were extracted according to the definitions in the GenBank files. Additionally, fourfold degenerate sites (4D-sites) were identified in codeml from PAML  as all differences at the third sites of a codon are synonymous. The 4D-sites of the 5,116 nuclear genes and 13 mitochondrial genes were then extracted and concatenated, respectively.
Orthologous gene identification and sequence alignment
The pairwise orthologous peptides between each two species were identified by the INPARANOID program . The pairwise orthologous genes of all the four species were introduced into the MULTIPARANOID program  to generate multiple-species orthologs. The orthologs not covering all the four species were discarded in further analyses. The corresponding cDNA sequences of these orthologs were extracted by cdbfasta program  with the sequence names and were written into a single fasta file for each orthologous gene. Each fasta file contains four orthologs of a target gene. The sequences were aligned using the MACSE  and PRANK  programs. A simple in house Perl script was used to work on batch jobs for each of the files. The 13 protein coding mitochondrial genes were aligned directly using ClustalW program implemented in the MEGA software. In order to exclude low quality alignments, the orthologous genes with the aligned region <100 bp were discarded for further analyses.
To examine the substitution saturation, we randomly selected 500 nuclear genes, and tested using DAMBE . All of the 13 mitochondrial genes were tested manually using DAMBE. Moreover, the 4D-sites concatenation of nucleotide and mitochondrial genes were also tested for saturation. Furthermore, we used both Maximum Likelihood (ML) and Bayesian Inference (BI) approaches for phylogenetic tree reconstruction. jModelTest2 program  was used to select the best-fitting substitution model for each alignment according to the Akaike information criterion and Bayesian information criterion .
The PhyML version 3.1  was used to reconstruct ML trees for each gene with bootstrap replicates of 100 which is generally considered as a reasonable number of replicates. Again, a simple in-house Perl script was used to execute batch jobs. For the concatenated 5116 nuclear genes and the concatenated 13 mitochondrial genes, the RAxML version 7.8.6  that allows multiple partitions with a same model (partitioned by the different genes with the recommended GTR + G model) was used for ML tree reconstruction. The MrBayes version 3.1.2  was used to reconstruct trees for each gene, and for the concatenated 5116 nuclear genes and the concatenated 13 mitochondrial genes (partitioned according to different models) with 500,000 generations, which are sufficient to meet the 0.01 criteria of standard deviation of split frequencies. A batch mode defined by the program itself (set autoclose = yes nowarn = yes) was used for tree reconstruction. Finally, Chi-square test in SPSS 20.0 program was used to see whether the distribution of tree topologies significantly deviate the random distribution.
Availability of supporting data
The sequencing data has been deposited to the GenBank under BioProject numbers of PRJNA208780 and PRJNA211727. The phylogenetic data is available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.gs135.
NCBI non-redundant proteins
NCBI nucleotide sequences
Cluster of Orthologous Groups
Kyoto Encyclopedia of Genes and Genomes
Fourfold degenerated sites
Expressed sequence tag.
Financial support for this study was provided by the Knowledge Innovation Project of the Chinese Academy of Science (CAS) (KSCX2-EW-J-26) and the General Programs of the National Natural Science Foundation of China (Projects 31101628 and 31071915). We thank Peng Shi in Kunming Institute of Zoology, CAS for kindly sharing bamboo rat transcriptome data. HZ was supported by a start-up fund from Wuhan University.
- Nevo E: Mosaic evolution of subterranean mammals: tinkering, regression, progression, and global convergence. 2007, New York: Springer Berlin HeidelbergGoogle Scholar
- Nevo E: Adaptive convergence and divergence of subterranean mammals. Annu Rev Ecol Syst. 1979, 10 (1): 269-308. 10.1146/annurev.es.10.110179.001413.View ArticleGoogle Scholar
- Granjon L, Montgelard C: The input of DNA sequences to animal systematics: rodents as study cases. DNA sequencing – methods and applications. Edited by: Muns A. 2012, Rijeka: InTechGoogle Scholar
- Wilson DE, Reeder DM: Mammal species of the world: a taxonomic and geographic reference. 2005, Baltimore: Johns Hopkins University PressGoogle Scholar
- Jansa SA, Weksler M: Phylogeny of muroid rodents: relationships within and among major lineages as determined by IRBP gene sequences. Mol Phylogenet Evol. 2004, 31 (1): 256-276. 10.1016/j.ympev.2003.07.002.PubMedView ArticleGoogle Scholar
- Steppan S, Adkins R, Anderson J: Phylogeny and divergence-date estimates of rapid radiations in muroid rodents based on multiple nuclear genes. Syst Biol. 2004, 53 (4): 533-553. 10.1080/10635150490468701.PubMedView ArticleGoogle Scholar
- DeBry RW, Sagel RM: Phylogeny of Rodentia (mammalia) inferred from the nuclear-encoded gene IRBP. Mol Phylogenet Evol. 2001, 19 (2): 290-301. 10.1006/mpev.2001.0945.PubMedView ArticleGoogle Scholar
- Flynn LJ: The antiquity of Rhizomys and independent acquisition of fossorial traits in subterranean muroids. B Am Mus Nat Hist. 2009, 331: 128-156.View ArticleGoogle Scholar
- Gogolevskaya IK, Veniaminova NA, Kramerov DA: Nucleotide sequences of B1 SINE and 4.5S(I) RNA support a close relationship of zokors to blind mole rats (Spalacinae) and bamboo rats (Rhizomyinae). Gene. 2010, 460 (1–2): 30-38.PubMedView ArticleGoogle Scholar
- Michaux J, Catzeflis F: The bushlike radiation of muroid rodents is exemplified by the molecular phylogeny of the LCAT nuclear gene. Mol Phylogenet Evol. 2000, 17 (2): 280-293. 10.1006/mpev.2000.0849.PubMedView ArticleGoogle Scholar
- Michaux J, Reyes A, Catzeflis F: Evolutionary history of the most speciose mammals: Molecular phylogeny of muroid rodents. Mol Biol Evol. 2001, 18 (11): 2017-2031. 10.1093/oxfordjournals.molbev.a003743.PubMedView ArticleGoogle Scholar
- Norris RW, Zhou K, Zhou C, Yang G, William Kilpatrick C, Honeycutt RL: The phylogenetic position of the zokors (Myospalacinae) and comments on the families of muroids (Rodentia). Mol Phylogenet Evol. 2004, 31 (3): 972-978. 10.1016/j.ympev.2003.10.020.PubMedView ArticleGoogle Scholar
- Robinson M, Catzeflis F, Briolay J, Mouchiroud D: Molecular phylogeny of rodents, with special emphasis on murids: evidence from nuclear gene LCAT. Mol Phylogenet Evol. 1997, 8 (3): 423-434. 10.1006/mpev.1997.0424.PubMedView ArticleGoogle Scholar
- Jansa SA, Giarla TC, Lim BK: The phylogenetic position of the rodent genus Typhlomys and the geographic origin of Muroidea. J Mammal. 2009, 90 (5): 1083-1094. 10.1644/08-MAMM-A-318.1.View ArticleGoogle Scholar
- Malik A, Korol A, Hubner S, Hernandez AG, Thimmapuram J, Ali S, Glaser F, Paz A, Avivi A, Band M: Transcriptome sequencing of the blind subterranean mole rat, spalax galili: utility and potential for the discovery of novel evolutionary patterns. PLoS One. 2011, 6: 8-Google Scholar
- Remm M, Storm CEV, Sonnhammer ELL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314 (5): 1041-1052. 10.1006/jmbi.2000.5197.PubMedView ArticleGoogle Scholar
- Kuzniar A, Van Ham RCHJ, Pongor S, Leunissen JAM: The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 2008, 24 (11): 539-551. 10.1016/j.tig.2008.08.009.PubMedView ArticleGoogle Scholar
- Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL: Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics. 2006, 22 (14): E9-E15. 10.1093/bioinformatics/btl213.PubMedView ArticleGoogle Scholar
- Ranwez V, Harispe S, Delsuc F, Douzery EJP: MACSE: Multiple Alignment of Coding SEquences Accounting for frameshifts and stop codons. PLoS One. 2011, 6: 9-View ArticleGoogle Scholar
- Loytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA. 2005, 102 (30): 10557-10562. 10.1073/pnas.0409137102.PubMed CentralPubMedView ArticleGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.PubMed CentralPubMedView ArticleGoogle Scholar
- Reid AH, Taubenberger JK, Fanning TG: Evidence of an absence: the genetic origins of the 1918 pandemic influenza virus. Nat Rev Microbiol. 2004, 2 (11): 909-914. 10.1038/nrmicro1027.PubMedView ArticleGoogle Scholar
- Xia X: DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol. 2013, 30 (7): 1720-1728. 10.1093/molbev/mst064.PubMed CentralPubMedView ArticleGoogle Scholar
- Darriba D, Taboada GL, Doallo R, Posada D: jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 2012, 9 (8): 772-PubMed CentralPubMedView ArticleGoogle Scholar
- Posada D, Buckley TR: Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004, 53 (5): 793-808. 10.1080/10635150490522304.PubMedView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, et al: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29 (7): 644-U130. 10.1038/nbt.1883.PubMed CentralPubMedView ArticleGoogle Scholar
- Martin JA, Wang Z: Next-generation transcriptome assembly. Nat Rev Gen. 2011, 12 (10): 671-682. 10.1038/nrg3068.View ArticleGoogle Scholar
- Simonson TS, Yang Y, Huff CD, Yun H, Qin G, Witherspoon DJ, Bai Z, Lorenzo FR, Xing J, Jorde LB, et al: Genetic evidence for high-altitude adaptation in Tibet. Science. 2010, 329 (5987): 72-75. 10.1126/science.1189406.PubMedView ArticleGoogle Scholar
- Nei M, Kumar S: Molecular evolution and phylogenetics. 2000, New York: Oxford University PressGoogle Scholar
- Jeffroy O, Brinkmann H, Delsuc F, Philippe H: Phylogenomics: the beginning of incongruence?. Trends Genet. 2006, 22 (4): 225-231. 10.1016/j.tig.2006.02.003.PubMedView ArticleGoogle Scholar
- Philippe H, Delsuc F, Brinkmann H, Lartillot N: Phylogenomics. Annu Rev Ecol Evol S. 2005, 36: 541-562.View ArticleGoogle Scholar
- Nevo E: Mosaic evolution of subterranean mammals: regression, progression and global convergence. 1999, New York: Oxford University PressGoogle Scholar
- Li WZ, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22 (13): 1658-1659. 10.1093/bioinformatics/btl158.PubMedView ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J: The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res. 2005, 33: D71-D74. 10.1093/nar/gni070.PubMed CentralPubMedView ArticleGoogle Scholar
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59 (3): 307-321. 10.1093/sysbio/syq010.PubMedView ArticleGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22 (21): 2688-2690. 10.1093/bioinformatics/btl446.PubMedView ArticleGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17 (8): 754-755. 10.1093/bioinformatics/17.8.754.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.