- Research article
- Open Access
Gene content dissimilarity for subclassification of highly similar microbial strains
© The Author(s). 2016
- Received: 16 March 2016
- Accepted: 3 August 2016
- Published: 17 August 2016
Identification and classification of highly similar microbial strains is a challenging issue in microbiology, ecology and evolutionary biology. Among various available approaches, gene content analysis is also at the core of microbial taxonomy. However, no threshold has been determined for grouping microorgnisms to different taxonomic levels, and it is still not clear that to what extent genomic fluidity should occur to form a microbial taxonomic group.
By taking advantage of the eggNOG database for orthologous groups, we calculated gene content dissimilarity among different microbial strains based on the orthologous gene profiles and tested the possibility of applying gene content dissimilarity as a quantitative index in classifying microbial taxonomic groups, as well as its potential application in subclassification of highly similar microbial strains. Evaluation of gene content dissimilarity to completed microbial genomes at different taxonomic levels suggested that cutoffs of 0.2 and 0.4 can be respectively used for species and family delineation, and that 0.2 gene content dissimilarity cutoff approximately corresponded to 98 % 16S rRNA gene identity and 94 % ANI for microbial species delineation. Furthermore, application of gene content dissimilarity to highly similar microbial strains suggested it as an effective approach in classifying closely related microorganisms into subgroups.
This approach is especially useful in identifying pathogens from commensals in clinical microbiology. It also provides novel insights into how genomic fluidity is linked with microbial taxonomy.
- Microbial subclassification
- Highly similar strains
- Gene content dissimilarity
- Genomic fluidity
Identification and classification of microorganisms is one of the most important but difficult and challenging issues in microbiology, ecology and evolutionary biology. Traditional methods for identification and classification of microorganisms mainly rely on morphological, physiological and biochemical properties of isolated microorganisms . However, characterizing these properties are experimentally very complicated and no quantitative standards can be applied for the obtained descriptive data. Moreover, these properties may differ greatly under different experimental conditions, leading to biased observations of the isolated microorganism. Thus, there have been continuous demands for quantitative approaches to delineate and classify microorganisms by the scientific community, such as methods based on genotypes .
For several decades, many efforts have been made to more accurately identify and classify microorganisms, especially at the species level. Among them, DNA-DNA hybridization (DDH) and 16S rRNA gene identity are the two most successful and widely accepted achievements, the former of which is still regarded as the gold standard for microbial species delineation. However, the DDH approach is experimentally tedious and hard to standardize between different laboratories in addition to several other problems, such as that the value obtained with the same pair of strains depends on which is used as probe and which as target . Due to these reasons, 16S rRNA gene analysis has been mainly used in place of DDH for describing new species since the past decades [4, 5]. However, for 16S rRNA gene identity, although it is generally accepted that 97 % or higher sequence identity be used as a cutoff to define microbial species [6, 7], problems have been reported by several labs that 16S rRNA gene identity even cannot distinguish several microbial genera, such as the ones belonging to Enterobacteriaceae (particularly Enterobacter and Pantoea) . And it is now generally accepted that DDH only be carried out when 16S rRNA identity between two strains is 97 % or higher [6, 8].
In the post-genomic era, with more reference genomes getting sequenced by the scientific community, genomic approaches such as in-silico DDH , average amino acids identity (AAI) , average nucleotide identity (ANI)  and multi-locus sequence analysis (MLSA)  have been developed. By integrating genomic information, these approaches are proven to be more accurate and reliable in microbial species delineation than 16S rRNA gene identity [13, 14]. Among these approaches, in-silico DDH could be considered as a genomic replacement of wet-lab DDH, for which a 70 % cutoff can be used for species delineation. For AAI and ANI, a cutoff of 94–96 % [10, 11, 15, 16] is generally accepted by microbiologists for their corresponding to 70 % DDH and 97 % 16S rRNA identity, and is becoming a gold genomic standard for microbial species delineation.
Besides the above approaches, gene content analysis proposed as early as in 1999 is another post-genomic analysis at the core of current species definition and has gained success in microbial phylogenetic analysis [17–20]. This approach, although has not gained as wide application as sequence identity based methods, the idea complies several species concepts in microbial systematics such as the recombination theory [21–23] and Cohan’s ecotype concept [24, 25]. These concepts propose that microbial species are formed by acquisition and loss of functional traits through lateral gene transfer and periodic selection, respectively. Such genomic fluidity phenomena plays important roles in microbial genome evolution and identifying closely related organisms such as distinguishing pathogens from commensals [26, 27]. Notably, previous gene content analysis mainly relied on all vs. all pairwise comparison, and reanalysis would always be required when a new genome was added. Most importantly, to our best knowledge, cutoffs for gene content analysis in classifying microorganisms are not yet available.
In this study, we first aim to take advantage of currently available comprehensive ortholog databases such as eggNOG , in which orthologous groups are defined by all vs. all clustering approaches, a similar approach as in gene content analysis and thus would simplify the computational procedure for gene content analysis. We then try to address the following two biological questions based on the obtained gene content dissimilarity metrics. First, whether cutoffs could be determined for gene content dissimilarity in classifying microorganisms into different taxonomic groups, ie to what extent genomic fluidity should generally achieve to form a new taxonomic group, eg species? Second, whether gene content dissimilarity could be used for subspecies level classification of highly similar microbial strains? To our best knowledge, current approaches in identifying and subclassifying highly similar microbial strains still mainly rely on phenotypic properties, because approaches based on sequence identity can hardly achieve such purpose due to highly similar conserved genes at subspecies level. As a result, evaluation of gene content dissimilarity using currently sequenced microbial genomes at different taxonomic levels suggested that cutoffs of 0.2 and 0.4 can be respectively used for species and family delineation. Further application of gene content dissimilarity to highly similar microbial strains suggested it as an effective approach in classifying closely related microorganisms into subgroups. This is especially useful in identifying pathogens from commensals in clinical microbiology.
Overview of the framework
Although microbial taxonomy at species and higher levels mainly rely on sequence identity approaches such as 16S rRNA gene identity and ANI, identification and classification of highly similar microbial strains still require phenotypic properties of the isolated microorganisms . However, current laboratory approaches may differ greatly from natural conditions and possess limitations of only characterizing a few characteristics of isolated microorganisms. This may lead to misclassification of microorganisms with distinct ecologies, habitats and genotypes. Gene content conceives the notion that the phenotypic properties are ultimately determined by the genes microbial strains harbor. Thus it is expected that the phenotypic differences among different microbial strains can be reflected by gene content dissimilarity.
Gene content dissimilarity cutoffs for microbial species and family delineation
Gene content dissimilarity vs. 16S vs. ANI for microbial species definition
To further evaluate the performance of gene content dissimilarity on microbial species delineation, we extracted all microbial strains in the genera with ≥2 species and each species with ≥5 strains. A total of 33 microbial species were evaluated. Significance tests of the orthologous gene profiles of microbial species against other species in the same genus were carried out (Additional file 1: Table S1). The non-parametric multivariate analysis MRPP (multi-response permutation procedure) based on Bray-Curtis dissimilarity distance was performed. As a result, all microbial species subjected to the tests were significantly different from other species in the same genus with P ≤ 0.005, except for species Mycobacterium bovis (P = 0.024) and Pseudomonas fluorescens (P = 0.018). This suggested that the gene content dissimilarity method proposed in this study can be confidently applied to delineate currently well recognized microbial species.
Enterobacteriaceae subclassification using gene content dissimilarity
To further investigate if gene content dissimilarity can also be used to classify closely related microbial strains at subgenus and subspecies level, pairwise gene content dissimilarity was extracted for E. coli and Shigella strains and then subjected to PCoA clustering (Fig. 4b). This may provide higher resolution in identifying highly similar microbial strains. As a result, Shigella strains were well separated from Escherichia strains by both first and second axis when other Enterobacteriaceae genera were excluded from analysis. This suggested that although Shigella and E. coli were highly similar, they were still substantially functionally different and might be considered as different species of Escherichia. Interestingly, Escherichia coli O157:H7 strains can also be well separated from other E. coli strains by the first axis. This indicated that O157:H7 strains harbored markedly different functional capacity from other E. coli strains and gene content dissimilarity can be used as an effective post-genomic index to identify O157:H7 strains.
Streptococcus classification using gene content dissimilarity
Bacillus cereus subclassification using gene content dissimilarity
We also applied gene content dissimilarity to classify the Bacillus cereus group, which is strikingly resistant to any currently available classification systems . A total of 31 compelte genomes and 119 draft genomes belonging to B. anthracis, B. cereus, and B. thuringiensis were recruited. Two analyses were carried out here, including strains with complete genomes and all strains with complete and draft genomes (Additional file 1: Figure S2). Interestingly, clear separation of B. anthracis from B. cereus and B. thuringiensis could be observed in both analyses. The B. cereus and B. thuringiensis strains with complete genomes could be approximately clustered into three subgroups (Additional file 1: Figure S2A). Such trend of separation could also be observed when more draft genomes were included, but with more vague boundaries (Additional file 1: Figure S2B). This could be due to an increase of subgroups when more draft genomes were added.
Gene content analysis serves as the genomic ground for phenotypic differences and is one of the major post-genomic approaches developed for microbial phylogenetic reconstruction [17–20]. Previous implementation of gene content analysis relies on all vs. all comparison of genes in interested microbial groups, and a reanalysis is needed every time a new strain is added. By taking advantage of recently developed orthologous gene databases such as eggNOG , this study suggests that all vs. all comparison for gene content dissimilarity could be approximated by searching against a fixed database. Also, a previous study implementing taxon-specific genes and eggNOG database suggested the usefulness of such strategies in microbial taxonomic classification . Although the approach will suffer potential limitations from not including singleton genes in the database, it is not expected to affect the results because the phenotypic properties expressed by these genes are usually not characterized for microbial classification. In addition, these singleton geneseins are also not subjected to phylogenetic analysis of microorganisms, for which conserved gene families are usually selected .
Current sequence identity based approaches mainly focus on the species problem in microbiology, but rarely at other taxonomic levels such as family level. Although gene content analysis has long been applied to reconstruct the phylogenetic relationship of microorganisms, cutoffs have not yet been determined for delineating different taxonomic groups. Recently, Qin et al. applied the concept of percentage of conserved proteins (POCP) to estimate the evolutionary and phenotypic distance between two strains and suggested that a pairwise POCP cutoff of 50 % can be served as a genus boundary for prokaryotic groups . The POPC approach, which relied on all vs. all pairwise identification of orthologous genes, was similar to the one we proposed in this study but more computationally complicated, and seemed not suitable for sublevel classification such as species level. Our results here, interestingly, suggested that gene content dissimilarity could be served as an effective index for microbial subspecies, species and family delineation, but not for genus delineation. Such differences could be due to several reasons, such as the resolution of these two approaches in assigning gene groups, the number of microbial strains recruited in the studies (235 vs. 2772), as well as the possibility that the boundaries between microbial genus and species/family could be relatively vague.
The gene content dissimilarity approach developed in this study complies several species concepts proposed by microbiologists , including the recombination theory [3, 40] and Cohan’s ecotype concept [24, 25]. The recombination concept proposes that microbial species are formed by partially exchanging and obtaining homologous and non-homologous genes via lateral gene transfer [21–23]. Recent studies suggest that lateral gene transfer frequently occur to transfer protein-coding genes among microorganisms and is a major evolutionary force for prokaryotes to adapt novel traits such as antibiotic resistance from the environments and other microorganisms in the community [41–44]. The ecotype ecological species concept proposes that prokaryotes form species by adapting to specific environments, for which periodic selection is the major force of cohesion [24, 25]. Genes responsible for adaptive phenotypes are fixed, while less adaptive traits are purged during the periodic selection process. Notably, no matter how different these two concepts are, both theories point out the importance of acquiring and losing of genes/traits in microbial species formation. A question then arise that to what extent microbial species should obtain and/or lose genes/traits to form a new taxonomic group, eg species. Our study addressed this question that a minimum of 0.2 and 0.4 Bray-Curtis gene content dissimilarities should be reached to confidently call species and family, respectively.
More interestingly, application of gene content dissimilarity to highly similar microbial strains/species suggests that gene content dissimilarity can also be served as a powerful index for classifying highly similar microbial strains, although an exact cutoff cannot be determined due to the varied rules in defining microbial subgroups. Enterobacteriaceae, Streptococcus, and B. cereus group are relatively well studied for their wide existence and pathogenic properties of several species. Several species/genera belonging to these groups can hardly be distinguished by traditional approaches, such as Yersinia species [45, 46], E.coli/Shigella [47–49], and Streptococcus oralis/mitis/pneumonia [50–54]. Among the Yersinia species, Y. enterocolitica, Y. pseudotuberculosis and Y. pestis are pathogenic for mammals. Contrast classification groups are proposed for these three species based on clinical/phenotypic and genotype criteria. Specifically, Y. enterocolitica, Y. pseudotuberculosis are classified as one group and Y. pestis as another when judging by their clinical and epidemiological features, while DNA-DNA hybridization suggests that Y. pseudotuberculosis and Y. pestis should be classified as one group or even species for their almost identical chromosomes . Our results supports the later that Y. pseudotuberculosis and Y. pestis are highly similar at their gene content, consistent with their high DNA-DNA hybridization values. Shigella species are now generally accepted as a clade of the species E. coli based on phylogenetic analysis of conserved gene sequences [30, 47, 55]. Due to the failure of phylogenetic identification of Shigella from E. coli, they are mainly distinguished by their biochemical and serotype properties. However, recent whole-genome-based approach suggests that all four Shigella species are distinctly different from E. coli and form sister species to E. coli in the genus Escherichia . Interestingly, our results agree with the whole-genome-based study that Shigella and E. coli strains are dramatically different from each other, and should be considered as individual species in Escherichia. Notably, our results indicated that E. coli O157:H7 can even be considered as a different species of Escherichia for their divergent gene content from other E. coli strains. Similarly, S. pneumoniae, S. mitis and S. oralis are also closely related species and have encountered difficulties in identification using traditional phylogenetic approaches , while our results suggested that they could be well separated from each other by gene content dissimilarity. All these results suggest that gene content dissimilarity could be used as an effective index in classifying closely related microbial strains, even at subspecies level.
Notably, although currently available post-genomic approaches are technically different from each other, they are either directly or indirectly linked with the classical DDH method, ie sequence identities. It is therefore not difficult to figure out their high correlation with each other in microbial delineation. What’s interesting here is that these technical differences have addressed different microbial taxonomic problems. For example, the well-known ANI method suggests that microbial species can be defined with an ANI cutoff of 94–96 % [11, 15, 16]. The POCP method, however, suggests a genus boundary for microbial delineation . While the gene content dissimilarity approach we evaluated in this study proposed cutoffs for microbial species and family delineation. No matter how, these approaches have addressed several different questions in microbial systematics and evolutionary biology, such as the relationship between microbial speciation and nucleotide polymorphism, and the extent of genomic fluidity in forming different microbial taxonomic groups. Importantly, current species classification and demarcation are so diverse in metabolic capabilities  and ecology  that they are neither rooted in evolutionary nor ecological theories . And with the rapid accumulation of genome sequences from so many microorganisms, it is urged by microbiologist that it is now about the time for order in microbial systematics by taking account of both phylogeny and biological signatures . As different indices tried to solve microbial systematics problems in different angles, we herein advocate using multiple indices for confident classification and delineation of microorganisms.
This study presented a post-genomic approach—gene content dissimilarity, for classification of highly similar microbial strains and as well addressed an interesting evolutionary biology question that to what extent genomic fluidity should occur in forming microbial species and family. Our results suggested that cutoffs of 0.2 and 0.4 gene content dissimilarity could be respectively used for microbial species and family level delineation, the latter of which a general cutoff was not proposed for many years  until recently . More importantly, application of gene content dissimilarity showed clear separation of highly similar microbial strains into different subgroups at high resolutions by removing potential noises from other species/genera, ie excluding other species/genera from the analysis. The study provided a genomic mean for identifying closely related microbial strains and could be useful in identifying pathogens from commensals in clinical microbiology, especially when combined with approaches like ANI and 16S rRNA gene identity.
Data acquisition and processing
To evaluate the performance of gene content dissimilarity on microbial taxonomy delineation, a total of 2772 completed microbial genomes were downloaded from NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/). GenBank format genome sequence files and protein sequence files (FASTA format) were retrieved. Full genome sequences and 16S rRNA gene sequences were extracted from GenBank files by PERL scripts implementing BioPerl modules. A full list of the downloaded microbial genomes and accession numbers can be found in Additional file 2.
To test the performance of gene content dissimilarity on distinguishing highly similar microbial strains, both draft and completed genome sequences were recruited for Enterobacteriaceae, Streptococcus, and the Bacillus cereus group, which were three largest microbial groups with known taxonomic problems and many reference genomes available. Microbial strains belonging to dominant taxonomic groups were selected for evaluation. A total of 916 Enterobacteriaceae, 283 Streptococcus genomes, and 150 Bacillus genomes were retrieved, respectively. Among the Enterobacteriaceae genomes, 14 belonged to Enterobacter, 384 to Escherichia, 45 to Klebsiella, 314 to Salmonella, 14 to Serratia, 42 to Shigella, and 103 to Yersinia. The 283 Streptococcus genomes included 83 S. agalactiae strains, 7 S. mitis strains, 8 S. oralis strains, 38 S. mutans strains, 119 S. pneumoniae strains, 12 S. pyogenes strains, 10 S. suis strains, and 6 S. thermophilus strains. Classification and identification problems have been reported for several of these recruited genera/species, such as Escherichia vs. Shigella, and S. mitis vs. S. oralis vs. S. pneumonia.
Gene content dissimilarity calculation
Where C ij represented the sum of lesser number of genes mapped to each orthologous group, S i and S j were the total number of genes mapped to eggNOG database in each genome.
An example showing how Bray-Curtis dissimilarity was calculated between strain I and strain J. (Note: dissimilarity calculation in real case would be more complex because typical microbial genomes usually comprise thousands of genes)
# genes mapped in strain I
# genes mapped in strain J
C n *
C ij #
S i $
S j %
BC ij &
16S rRNA gene identity and ANI calculation
Pairwise 16S rRNA gene identity was calculated by the USEARCH (v7.0.1001) program . Global sequence identity was calculated. In the case multiple 16S rRNA gene copies were found in a genome, all of them were subjected to calculation and the average value was used as the identity between two microbial strains.
Pairwise ANI calculation for the downloaded 2772 genomes was carried out by a perl script obtained from https://github.com/chjp/ANI. This script employed the same algorithm and output the same result as the JSpecies program . The program BLAST (v2.2.25) was called in the script for ANI calculation.
PCoA clustering and significance tests
We employed PCoA clustering methods for better visualization of our results. In fact, many other clustering programs should also work in separating microbial strains into different groups. The non-parametric multivariate analysis MRPP (multi-response permutation procedure) based on Bray-Curtis dissimilarity distance was performed to evaluate the significance of orthologous profiles of microbial species against other species in the same genus. The vegan package  developed in R environment was used in this study.
ANI, average nucleotide identity; AAI, average amino acids identity; DDH, DNA-DNA hybridization; MLSA, multi-locus sequence analysis; MRPP, multi-response permutation procedure; PCoA, principle coordinate analysis; POCP, percentage of conserved proteins
We thank Dr. Jizhong Zhou at the University of Oklahoma for providing computational resources for this study. We are also grateful to the anonymous reviewers for their critical input that significantly helped improve the manuscript.
This work was supported by the Fundamental Research Funds for the Central Universities of China (2016QNA4039 Q.T. and 2015QNA4044 L.L.), by the Open Funding of Zhejiang Provincial Key Laboratory of Health Risk Factors for Seafood (201605), and by the start-up packages of Zhejiang University (129000-193820101-001-027 Q.T. and 129000-193820101-001-004 L.L.).
Availability of data and materials
The dataset supporting the results of this article is included within the article. All genomes analyzed in this study were downloaded from NCBI ftp site. A list of microbial genomes as well as the perl scripts associated with this study are provided as supplementary data.
QT performed the data analysis. QT and LL conceived the study and wrote the manuscript. All authors read and approved the final manuscript.
Q.T. received his PhD in microbiology at the University of Oklahoma. He is now an assistant professor at Zhejiang University. He is also appointed as an affiliate scholar at the University of Oklahoma. His main research interests include microbial ecology, bioinformatics, metagenomics and genomics.
L.L. received her PhD in Biochemistry and Molecular Biology at the University of Chinese Academy of Sciences. She is now an associate professor at Zhejiang University. Her main research interests include microbial genetics, biofuel, and microbial ecology.
The authors declare that they have no competing interests.
Consent for publication
Not applicable. No human participant is involved in this study.
Ethics approval and consent to participate
Not applicable. No human or animal data is included in the study.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Rodriguez-R LM, Konstantinidis KT. Bypassing cultivation to identify bacterial species. Microbe. 2014;9(3):111–8.Google Scholar
- Gevers D, Cohan FM, Lawrence JG, Spratt BG, Coenye T, Feil EJ, Stackebrandt E, de Peer YV, Vandamme P, Thompson FL, et al. Re-evaluating prokaryotic species. Nat Rev Micro. 2005;3(9):733–9.View ArticleGoogle Scholar
- Achtman M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nat Rev Microbiol. 2008;6(6):431–40.PubMedGoogle Scholar
- Janda JM, Abbott SL. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J Clin Microbiol. 2007;45(9):2761–4.View ArticlePubMedPubMed CentralGoogle Scholar
- STACKEBRANDT E, GOEBEL BM. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Evol Microbiol. 1994;44(4):846–9.View ArticleGoogle Scholar
- Stackebrandt E, Ebers J. Taxonomic parameters revisited: tarnished gold standards. Microbiology today. 2006;33(4):152.Google Scholar
- Kim M, Oh HS, Park SC, Chun J. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol. 2014;64(Pt 2):346–51.View ArticlePubMedGoogle Scholar
- Tindall BJ, Rossello-Mora R, Busse HJ, Ludwig W, Kampfer P. Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol. 2010;60(Pt 1):249–66.View ArticlePubMedGoogle Scholar
- Meier-Kolthoff JP, Auch AF, Klenk HP, Göker M. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics. 2013;14(1):1.View ArticleGoogle Scholar
- Konstantinidis KT, Tiedje JM. Towards a genome-based taxonomy for prokaryotes. J Bacteriol. 2005;187(18):6258–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Konstantinidis KT, Tiedje JM. Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci U S A. 2005;102(7):2567–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311(5765):1283–7.View ArticlePubMedGoogle Scholar
- Thompson CC, Chimetto L, Edwards RA, Swings J, Stackebrandt E, Thompson FL. Microbial genomic taxonomy. BMC Genomics. 2013;14(1):1.View ArticleGoogle Scholar
- Zhi XY, Zhao W, Li WJ, Zhao GP. Prokaryotic systematics in the genomics era. Antonie Van Leeuwenhoek. 2012;101(1):21–34.View ArticlePubMedGoogle Scholar
- Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides NC, Pati A. Microbial species delineation using whole genome sequences. Nucleic Acids Res. 2015;43(14):6761–71.View ArticlePubMedPubMed CentralGoogle Scholar
- Richter M, Rossello-Mora R. Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci U S A. 2009;106(45):19126–31.View ArticlePubMedPubMed CentralGoogle Scholar
- Snel B, Bork P, Huynen MA. Genome phylogeny based on gene content. Nat Genet. 1999;21(1):108–10.View ArticlePubMedGoogle Scholar
- Gibbon STF, House CH. Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 1999;27(21):4218–22.View ArticleGoogle Scholar
- Gu X, Huang W, Xu D, Zhang H. GeneContent: software for whole-genome phylogenetic analysis. Bioinformatics. 2005;21(8):1713–4.View ArticlePubMedGoogle Scholar
- Gu X, Zhang H. Genome phylogenetic analysis based on extended gene contents. Mol Biol Evol. 2004;21(7):1401–8.View ArticlePubMedGoogle Scholar
- Paul JH. Microbial gene transfer: an ecological perspective. J Mol Microbiol Biotechnol. 1999;1(1):45–50.PubMedGoogle Scholar
- Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405(6784):299–304.View ArticlePubMedGoogle Scholar
- Lawrence JG. Gene transfer, speciation, and the evolution of bacterial genomes. Curr Opin Microbiol. 1999;2(5):519–23.View ArticlePubMedGoogle Scholar
- Cohan FM. Bacterial species and speciation. Syst Biol. 2001;50(4):513–24.View ArticlePubMedGoogle Scholar
- Cohan FM. What are bacterial species? Annu Rev Microbiol. 2002;56(1):457–87.View ArticlePubMedGoogle Scholar
- Ahmed N, Dobrindt U, Hacker J, Hasnain SE. Genomic fluidity and pathogenic bacteria: applications in diagnostics, epidemiology and intervention. Nat Rev Microbiol. 2008;6(5):387–94.View ArticlePubMedGoogle Scholar
- Kislyuk AO, Haegeman B, Bergman NH, Weitz JS. Genomic fluidity: an integrative view of gene diversity within microbial populations. BMC Genomics. 2011;12(1):1.View ArticleGoogle Scholar
- Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 2013;42(D1):D231–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight R, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotech. 2013;31(9):814–21.View ArticleGoogle Scholar
- Lan R, Reeves PR. Escherichia coli in disguise: molecular origins of Shigella. Microbes Infect. 2002;4(11):1125–32.View ArticlePubMedGoogle Scholar
- Moran NA, Russell JA, Koga R, Fukatsu T. Evolutionary relationships of three New species of enterobacteriaceae living as symbionts of aphids and other insects. Appl Environ Microbiol. 2005;71(6):3302–10.View ArticlePubMedPubMed CentralGoogle Scholar
- Paradis S, Boissinot M, Paquette N, Bélanger SD, Martel EA, Boudreau DK, Picard FJ, Ouellette M, Roy PH, Bergeron MG. Phylogeny of the Enterobacteriaceae based on genes encoding elongation factor Tu and F-ATPase β-subunit. Int J Syst Evol Microbiol. 2005;55(5):2013–25.View ArticlePubMedGoogle Scholar
- Pham HN, Ohkusu K, Mishima N, Noda M, Monir Shah M, Sun X, Hayashi M, Ezaki T. Phylogeny and species identification of the family Enterobacteriaceae based on dnaJ sequences. Diagn Microbiol Infect Dis. 2007;58(2):153–61.View ArticlePubMedGoogle Scholar
- Naum M, Brown EW, Mason-Gamer RJ. Is 16S rDNA a reliable phylogenetic marker to characterize relationships below the family level in the enterobacteriaceae? J Mol Evol. 2008;66(6):630–42.View ArticlePubMedGoogle Scholar
- Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, Oggioni M, Dunning Hotopp JC, Hu FZ, Riley DR, et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. 2010;11(10):2010–1.View ArticleGoogle Scholar
- Marsh PD, Martin MV, Lewis MA, Williams D. Oral microbiology. Philadelphia: Elsevier Health Sciences; 2009.Google Scholar
- Liu Y, Lai Q, Göker M, Meier-Kolthoff JP, Wang M, Sun Y, Wang L, Shao Z. Genomic insights into the taxonomic status of the Bacillus cereus group. Sci Rep. 2015;5:14082.View ArticlePubMedPubMed CentralGoogle Scholar
- Gupta A, Sharma VK. Using the taxon-specific genes for the taxonomic classification of bacterial genomes. BMC Genomics. 2015;16(1):396.View ArticlePubMedPubMed CentralGoogle Scholar
- Qin QL, Xie BB, Zhang XY, Chen XL, Zhou BC, Zhou J, Oren A, Zhang YZ. A proposed genus boundary for the prokaryotes based on genomic insights. J Bacteriol. 2014;196(12):2210–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ereshefsky M. Microbiology and the species problem. Biol Philos. 2010;25(4):553–68.View ArticleGoogle Scholar
- Abby SS, Tannier E, Gouy M, Daubin V. Lateral gene transfer as a support for the tree of life. Proc Natl Acad Sci. 2012;109(13):4962–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Soucy SM, Huang J, Gogarten JP. Horizontal gene transfer: building the web of life. Nat Rev Genet. 2015;16(8):472–82.View ArticlePubMedGoogle Scholar
- Vos M, Hesselman MC, Te Beek TA, van Passel MW, Eyre-Walker A. Rates of lateral gene transfer in prokaryotes: high but Why? Trends Microbiol. 2015;23(10):598–605.View ArticlePubMedGoogle Scholar
- Popa O, Dagan T. Trends and barriers to lateral gene transfer in prokaryotes. Curr Opin Microbiol. 2011;14(5):615–23.View ArticlePubMedGoogle Scholar
- Carniel E. Evolution of pathogenic Yersinia, some lights in the dark. Adv Exp Med Biol. 2003;529:3–12.View ArticlePubMedGoogle Scholar
- Reuter S, Connor TR, Barquist L, Walker D, Feltwell T, Harris SR, Fookes M, Hall ME, Petty NK, Fuchs TM. Parallel independent evolution of pathogenicity within the genus Yersinia. Proc Natl Acad Sci. 2014;111(18):6768–73.View ArticlePubMedPubMed CentralGoogle Scholar
- van den Beld MJ, Reubsaet FA. Differentiation between Shigella, enteroinvasive Escherichia coli (EIEC) and noninvasive Escherichia coli. Eur J Clin Microbiol Infect Dis. 2012;31(6):899–904.View ArticlePubMedGoogle Scholar
- Zuo G, Xu Z, Hao B. Shigella strains Are Not clones of escherichia coli but sister species in the genus escherichia. Genomics Proteomics Bioinformatics. 2013;11(1):61–5.View ArticlePubMedGoogle Scholar
- Sims GE, Kim SH. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proc Natl Acad Sci. 2011;108(20):8329–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Glazunova OO, Raoult D, Roux V. Partial recN gene sequencing: a new tool for identification and phylogeny within the genus Streptococcus. Int J Syst Evol Microbiol. 2010;60(9):2140–8.View ArticlePubMedGoogle Scholar
- Chi F, Nolte O, Bergmann C, Ip M, Hakenbeck R. Crossing the barrier: Evolution and spread of a major class of mosaic pbp2x in Streptococcus pneumoniae, S. mitis and S. oralis. Int J Med Microbiol. 2007;297(7–8):503–12.View ArticlePubMedGoogle Scholar
- Ikryannikova L, Filimonova A, Malakhova M, Savinova T, Filimonova O, Ilina E, Dubovickaya V, Sidorenko S, Govorun V. Discrimination between Streptococcus pneumoniae and Streptococcus mitis based on sorting of their MALDI mass spectra. Clin Microbiol Infect. 2013;19(11):1066–71.View ArticlePubMedGoogle Scholar
- Kilian M, Poulsen K, Blomqvist T, Havarstein LS, Bek-Thomsen M, Tettelin H, Sorensen U. Evolution of Streptococcus pneumoniae and its close commensal relatives. PLoS One. 2008;3(7), e2683.View ArticlePubMedPubMed CentralGoogle Scholar
- Whatmore AM, Efstratiou A, Pickerill AP, Broughton K, Woodard G, Sturgeon D, George R, Dowson CG. Genetic relationships between clinical isolates of streptococcus pneumoniae, streptococcus oralis, and streptococcus mitis: characterization of “atypical” pneumococci and organisms allied to S. Mitis HarboringS. Pneumoniae virulence factor-encoding genes. Infect Immun. 2000;68(3):1374–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Meier-Kolthoff JP, Hahnke RL, Petersen J, Scheuner C, Michael V, Fiebig A, Rohde C, Rohde M, Fartmann B, Goodwin LA, et al. Complete genome sequence of DSM 30083(T), the type strain (U5/41(T)) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genomic Sci. 2014;9(2):1944–3277.Google Scholar
- Feldgarden M, Byrd N, Cohan FM. Gradual evolution in bacteria: evidence from Bacillus systematics. Microbiology. 2003;149(12):3565–73.View ArticlePubMedGoogle Scholar
- Schloter M, Lebuhn M, Heulin T, Hartmann A. Ecology and evolution of bacterial microdiversity. FEMS Microbiol Rev. 2000;24(5):647–60.View ArticlePubMedGoogle Scholar
- Gevers D, Cohan FM, Lawrence JG, Spratt BG, Coenye T, Feil EJ, Stackebrandt E, Van de Peer Y, Vandamme P, Thompson FL. Re-evaluating prokaryotic species. Nat Rev Microbiol. 2005;3(9):733–9.View ArticlePubMedGoogle Scholar
- Gribaldo S, Brochier-Armanet C. Time for order in microbial systematics. Trends Microbiol. 2012;20(5):209.View ArticlePubMedGoogle Scholar
- Yarza P, Yilmaz P, Pruesse E, Glockner FO, Ludwig W, Schleifer KH, Whitman WB, Euzeby J, Amann R, Rossello-Mora R. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Micro. 2014;12(9):635–45.View ArticleGoogle Scholar
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–1.View ArticlePubMedGoogle Scholar
- Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MHH, Oksanen MJ, Suggests M. The vegan package, Community ecology package. 2007. p. 631–7.Google Scholar