Comparative genomic analysis of nine Sphingobium strains: insights into their evolution and hexachlorocyclohexane (HCH) degradation pathways
- Helianthous Verma†1,
- Roshan Kumar†1,
- Phoebe Oldach1,
- Naseer Sangwan1,
- Jitendra P Khurana2,
- Jack A Gilbert3, 4 and
- Rup Lal1Email author
© Verma et al.; licensee BioMed Central Ltd. 2014
Received: 10 May 2014
Accepted: 23 October 2014
Published: 23 November 2014
Sphingobium spp. are efficient degraders of a wide range of chlorinated and aromatic hydrocarbons. In particular, strains which harbour the lin pathway genes mediating the degradation of hexachlorocyclohexane (HCH) isomers are of interest due to the widespread persistence of this contaminant. Here, we examined the evolution and diversification of the lin pathway under the selective pressure of HCH, by comparing the draft genomes of six newly-sequenced Sphingobium spp. (strains LL03, DS20, IP26, HDIPO4, P25 and RL3) isolated from HCH dumpsites, with three existing genomes (S. indicum B90A, S. japonicum UT26S and Sphingobium sp. SYK6).
Efficient HCH degraders phylogenetically clustered in a closely related group comprising of UT26S, B90A, HDIPO4 and IP26, where HDIPO4 and IP26 were classified as subspecies with ANI value >98%. Less than 10% of the total gene content was shared among all nine strains, but among the eight HCH-associated strains, that is all except SYK6, the shared gene content jumped to nearly 25%. Genes associated with nitrogen stress response and two-component systems were found to be enriched. The strains also housed many xenobiotic degradation pathways other than HCH, despite the absence of these xenobiotics from isolation sources. Additionally, these strains, although non-motile, but posses flagellar assembly genes. While strains HDIPO4 and IP26 contained the complete set of lin genes, DS20 was entirely devoid of lin genes (except linKLMN) whereas, LL03, P25 and RL3 were identified as lin deficient strains, as they housed incomplete lin pathways. Further, in HDIPO4, linA was found as a hybrid of two natural variants i.e., linA1 and linA2 known for their different enantioselectivity.
The bacteria isolated from HCH dumpsites provide a natural testing ground to study variations in the lin system and their effects on degradation efficacy. Further, the diversity in the lin gene sequences and copy number, their arrangement with respect to IS6100 and evidence for potential plasmid content elucidate possible evolutionary acquisition mechanisms for this pathway. This study further opens the horizon for selection of bacterial strains for inclusion in an HCH bioremediation consortium and suggests that HDIPO4, IP26 and B90A would be appropriate candidates for inclusion.
The family Sphingomonadaceae has been subdivided into five genera: Sphingomonas, Sphingobium, Novosphingobium, Sphingopyxis and Sphingosinicella[1, 2]. To date, the genomes of nearly 40 sphingomonads have been sequenced, which has revealed the genetic basis for the degradation of a broad range of polycyclic aromatic hydrocarbons (PAH) and polysaccharides . However, Sphingobium spp. are of particular interest due to their ability to degrade hexachlorocylcohexane (HCH). The majority of HCH isomers (i.e. α, β, δ, ϵ) are formed during the production of the insecticide lindane (γ-HCH), and have been active pollutants since the 1950s . Among all these isomers, only γ-HCH has insecticidal properties. Purification of γ-HCH (10-12%) from the mixture leads to the formation of HCH muck (88-90% of the total HCH mixture) having mainly α (60-70%), β (5-12%), δ (6-10%), and ϵ (3-4%) isomers . This has been generally discarded in the open by the side of industrial units creating a large number of HCH dumpsites between the 1960s to the 1980s around the world . Sphingobium spp. are often enriched in HCH dumpsites and have been shown to acquire and maintain genes associated with HCH degradation [7–10].
There is evidence that indicates high levels of polymorphisms in the amino acid sequences of the linA and linB genes. Further studies have revealed that these differences contribute to the efficacy of HCH degradation and substrate specificity . While there are several strains of sphingomonads isolated from HCH dumpsites with demonstrated differences in HCH degradation ability [8, 14], genome-wide comparative analyses to better understand the lin pathway, localization of lin genes in the genome and methods of recruitment have not yet been undertaken.
In order to understand the evolution of the HCH-degradation pathway, the draft genomes of six Sphingobium spp. isolated from HCH dumpsites and the complete genomes of three previously-sequenced, well-studied strains were analysed. Here, we characterize the genetic divergence between these strains in reference to the lin catabolic system and auxiliary characteristics associated with bioremediation potential. We also present evidence for possible plasmid and IS6100 based horizontal gene transfer (HGT) as the method for spread of the lin system genes among sphingomonads. Additionally, variation in the lin gene sequences is a matter of further investigation for improved degradation ability of these strains.
Results and discussion
Genomic features of Sphingobiumstrains
General characteristic features of the Sphingobium genomes
NCBI Accession No.
AP010803 to AP010806
Source of isolation
HCH Dumpsite, India
HCH Dumpsite, India
HCH Dumpsite, India
HCH Dumpsite, India
HCH Dumpsite, India
HCH Dumpsite, Czech Republic
Rhizosphere Soil, India
Soil Contaminated with γ-HCH, Japan
Waste water of kraft mill pulp, Japan
Genome Size (bp)
G + C content (%)
4646 (5161068 bp)
4646 (4105641 bp)
4636 (4152861 bp)
4033 (3644184 bp)
5288 (4682346 bp)
4914 (4312911 bp)
3976 (3570642 bp)
4414 (3929727 bp)
4097 (3825558 bp)
Average Gene Size (bp)
% of CDS
Genomic islands (bp)
CRISPR elements were only found associated with S. baderi LL03 (22 spacers) and S. lactosutens DS20 (5 spacers). These spacer sequences are known bacterial defense mechanisms against viral and plasmid challenges acquired from foreign invading DNA, with the number of new phage-derived spacers being correlated with phage resistance . However, their spacer sequences had no similarity to known viral phage sequences. Furthermore, LL03 maintained a type II CRISPR element with the cas9 gene involved in target interference, whereas DS20 had type I CRISPR elements with the cas3 gene . Strains LL03 and DS20 were isolated from HCH dumpsites in the Czech Republic and India, respectively, and these strains had two different CRISPR/CAS systems, that may correspond to their different geographical locations. These data also reflected that LL03 should have the greatest phage resistance.
Comparative phylogenetic analysis
Common gene content and functional profiling of Sphingobiumspp
Functional profiling was used to analyze pathways that were differentially enriched in these strains. For this, a dendogram was constructed based upon the top 50 subsystems at 0.8% minimum abundance using pearson correlation distance. The analysis revealed that the two-component system for gene expression was highly abundant in all of the Sphingobium genomes (Figure 4). This system is known to facilitate adaptation to extreme environmental conditions and likely contributes to the ability to survive in conditions of high HCH pressure, salinity, and acidity that exist at the HCH dumpsite . Additionally, the nine strains collectively showed an abundance of ABC transporters within their genomes. The abundance of these transporters implies that these strains are highly engaged in transport of a wide variety of substrates across extra- and intracellular membranes , which is consistent with the Sphingobium proficiency for degradation of a wide range of xenobiotics (mentioned above; Figure 4).
Interestingly, HDIPO4 and IP26, which had a close phylogenetic relationship, demonstrated differences in their functional repertoire, based on the top 50 subsystems. This is primarily driven by an increased abundance of 1,4-dichlorobenzene degradation, toluene and xylene degradation, caprolactam degradation, PPAR signaling and atrazine degradation pathways in HDIPO4, which was clustered with functional profiling of P25 and UT26S as they shared these enrichments. Furthermore, lipopolysaccharide biosynthesis, tyrosine metabolism, glycan and glycosaminoglycan degradation pathways were found enriched in IP26 as compared to HDIPO4. This variation suggests that while these strains exhibit similar genomic content, they exhibit differential dominance in their metabolic preferences.
Nitrogen assimilation and the presence of flagellar genes in non-motile Sphingobium
The genomes of all the nine strains were found to contain an enrichment of the two component signal transduction system for nitrogen stress response (NtrC pathway) . Additionally, the large subunit of assimilatory nitrate reductase, a key regulator that potentially enables the utilization of nitrate as a nitrogen source, was found to be under diversifying natural selection (dN/dS = 1.09), which suggests that these strains can tolerate low inorganic nitrogen concentrations and are evolving in response to this inorganic nitrogen stress. At high nitrogen concentration, the transmembrane protein glnC (ntrB/Histidine kinase) responds to nitrogen availability and phosphorylates glnG (ntrC), which in turn leads to the activation of glnA (glutamine synthetase) . Another key regulator of the pathway is glnB, which interacts and regulates the activity of glnC. When the nitrogen availability is low, glnB is subjected to post transcriptional modification by uridylation (mediated by glnD). This modification is reversed in N-sufficient conditions . Thus, the presence of NtrC pathway and nitrate reductase genes explains the ongoing phenomena of nitrogen assimilation by these strains at HCH dumpsites to acclimatize themselves in such nitrate concentrations. Increasing exposure to elevated hydrocarbon concentrations was found to be positively correlated with the relative abundance of genes associated with nitrogen metabolism .
The NtrC pathway is also associated with genes regulating chemotactic response, such as cheY, motA, motB, and flagellar biosynthesis proteins, such as flhA, fliO, fliP, fliR etc. All these genes were also found in the core-genome. cheY modulates the cell’s ability to interact with the flagellum and controls swimming behavior . Interestingly, while these Sphingobium strains are considered non-motile [25–30], each genome housed more than half of the genes needed for flagellar assembly and functioning. This raises the possibility that they are either in a process of acquiring or losing motility. The abundance of chemotaxis and motility genes has already been demonstrated in the metagenome of the HCH dumpsite  from where HDIPO4, IP26, P25, DS20, and RL3 were isolated. However, further analysis is needed to probe the reason for retention or loss of flagellar genes in the Sphingobium strains, and to investigate whether Sphingobium have the potential to gain motility through acquisition of the remaining genes under the high selective pressure of HCH in the stressed environments.
Recruitment of linpathway through different routes
The genome analysis revealed a mosaic distribution of lin genes and IS6100 elements in HCH-degrading Sphingobium spp. coupled with high polymorphism levels in the lin genes. This indicates the recruitment of lin genes through different routes in Sphingobium spp. under HCH stress, and further that the pathway has not yet stabilized in these strains but is instead subjected to further rearrangements and polymorphisms.
IS6100-mediated recruitment based on mosaic distribution pattern of lingenes
The IS6100 elements, known for disseminating lin genes through HGT among sphingomonads [7, 31–33], were found to be present in all of the newly sequenced strains associated with HCH degradation, including strain DS20 which did not degrade HCH. The number, as determined from the genome sequence, varied from 5 copies in UT26S to 24 copies in P25 (Table 1). The presence of a large number of IS6100 elements reflects a high degree of genomic rearrangement, as the IS6100 elements have already been demonstrated to play an important role in the spread and reorganization of the lin pathway in sphingomonads [7, 10, 31–33].
To further explore the mechanism of HGT in the spread and diversification of the lin system, we examined the colocalization of lin genes with mobile elements such as the insertion sequence IS6100 and transposons, and their presence on plasmids. In all of strains where linA gene was present in, it was found in nearly identical association with IS elements as in UT26S i.e. IS6100 was found within proximity of <5Kbp. However, in RL3, two IS6100 copies lies in the same orientation within the above mentioned range. Hence, this suggests that among these strains the association of linA with IS6100 is consistent, but the reason and possible involvement of IS6100 in the mechanism of duplication of the linA gene in RL3 needs to be identified.
In HDIPO4, a truncated copy of linF along with complete set of linC and linB was found with an IS6100 element (length of the segment = 15 Kbp) (Figure 5B). In contrast, in the case of the reference UT26S, these elements were dispersed, with linF present on chromosome 2 and linB and linC on chromosome 1. The association of these three elements suggests that they may have been brought together by IS6100-mediated transposition, a hypothesis supported by the fact that HDIPO4 contains a high number of IS6100 comparable to UT26S (Table 1), and that they may be in the process of forming an operon.
Of the three copies of linDER present in RL3, one was closely associated with the hmgB and hmgA genes of the homogentisate degradation pathway, separated by a copy of IS6100 (Figure 5B). In contrast, in UT26S, linDER was housed on a plasmid (pCHQ1), while hmgB and hmgA were found on chromosome 2 . Therefore, in RL3, it is possible that these two different aromatic compound degradation pathways were brought into close proximity by IS6100 mediated transposition. Thus, IS6100, apart from the spread of lin gene system, might be effective in the spread of homogentisate pathways despite the absence of homogentisate selective pressure at the HCH dumpsite, consistent with the fact that already sphingomonads that degrade aromatic hydrocarbons were found to contain catabolic genes associated with IS6100.
In strain LL03, isolated from the Czech Republic, linGHIJ genes were associated with IS6100, whereas in the UT26S genome, isolated from Japan, IS6100 was absent from the region proximal to these lower pathway genes. As IS6100 is reported to be a key driver in the recruitement of the lin system , differential organization of the IS6100 element with respect to lin genes for strains from geographically-disparate locations reflects an ongoing IS6100–driven evolution of the lin system, including the lower degradation pathway components such as linGHIJ.
IS6100 elements have also been found in the genome of DS20, which did not degrade HCH isomers (due to the lack of lin genes except linKLMN). However, in DS20, the regions flanking the IS6100 elements comprised a variety of xenobiotic tolerance and degradation genes (i.e., benzene 1,2-dioxygenase, CopA family copper resistance protein, maleylacetatereductase, a putative efflux protein, chlorocatechol 1,2-dioxygenase), which further supports the role of IS6100 in distributing genes for a broad-range of such functions in Sphingobium spp. The fact that DS20, a non-HCH degrader, maintained 15 copies of IS6100 elements clearly suggests the potential of this strain to acquire lin genes through IS6100 mechanisms in the future.
Plasmid mediated recruitment
In investigating the presence and spread of the lin genes, the recently sequenced genome of an HCH-degrader Sphingomonas sp. MM-1 is of interest as it was found to have five plasmids housing the genes of the lin pathway . In the MM-1 genome, the linF was found on pISP0; linA, linC, and a truncated linF on pISP1, linDER on pISP3, and linB, linC, and another truncated linF on pISP4  and linGHIJ was found on pISP0. Genes for an ABC transporter were found on the chromosome, but these did not share at least 80% identity to the linKLMN genes of UT26S. In addition to this, in strain UT26S, HCH-specific genes of the lin pathway were found to be housed on regions unique to the UT26S genome ; with linA, linB, linC genes in chromosome 1, linF on chromosome 2, and linDER on the plasmid pCHQ1 . The lower pathway genes, including linGHIJ and linKLMN were found on chromosomes 2 and 1, respectively, in regions that were conserved among sphingomonads .
Strains in transition to acquire linpathway
Of the nine sphingomonads under study, seven possessed components of the upper HCH degradation pathway to varying degrees of completion, and two, SYK6 and DS20, were completely devoid of them (Additional file 1: Table S2). SYK6 did not contain any components of the lin system and the DS20 genome contained only genes of the lower lin pathway- linKLMN an ABC transporter. Of the HCH-degraders, not every strain was found to house the complete array of lin genes characterized in UT26S or B90A. For instance, the P25 genome lacked linB, linC, linDER, linGH, linI and linJ genes while, strains RL3 and LL03 both lacked linC and LL03 lacked linB, as confirmed by PCR amplificiation (Additional file 1: Table S2). The differential composition of the lin system between these strains may be indicative of different steps in the evolution of the lin pathway, with IP26 in the stage of probable homologous recombination and looping out of linB, while LL03 shows potential gain of linGHIJ through IS6100-mediated HGT. Strain DS20 possesses ABC transporters and shows potential for acquisition of the lin genes, as it holds 15 copies of IS6100, while P25, in addition to the ABC transporter, has linA and linF but is yet to acquire the other lin genes.
linsystem sequence diversity and its effect on metabolic efficiency
The upper pathway genes linA, linB and linC degrade γ-HCH and α-HCH, and additionally linB acts on β-HCH, leading to the formation of β-2,3,4,5,6-pentachlorocyclohexanol (β-PCHL) (Figure 1). As α- and β-HCH form the major components of contamination at the HCH dumpsite (>80%), both linA and linB are extremely important enzymes encoding HCH dehydrochlorinase and haloalkane dehalogenase, respectively (Figure 1). To gain deeper insights into the lin gene sequence diversity and its impact on HCH degradation, the genetic divergence of the lin system components was analyzed with respect to the copy number and nucleotide sequence divergence of the lin genes in both upper and lower degradation pathways, using B90A as a reference.
Apart from this divergence of the linA sequence in HDIPO4, not such changes to the linA gene sequences were observed; all strains showed 100% sequence similarity to that of the linA2 gene  with the exception that linA2 of RL3 showed a single substitution of L78Q. It is important to mention here that the linA gene has already been reported to be under continuous selection pressure and a large number of variants of this gene exist [7, 32, 13, 39] and better variants of linA may be used for developing enzymatic bioremediation system for HCH.
In contrast to linA, there were less variations in linB sequences among strains under study. The sequence differences among linA and linB genes among different Sphingobium spp. are particularly interesting in light of findings that marginal differences in the amino acid sequences of linB in UT26S , SP+, B90A , BHC-A  and M1205  can alter the efficacy and substrate range, with the former group degrading β-HCH to β-PCHL and the latter group taking the pathway beyond PCHL to TCHL. HDIPO4 housed two identical linB copies with a T81A substitution and overall 99.6% similarity to B90A while RL3 linB gene had 98.9% identity, with three substitutions (T81A, D147A and A224V) as compared to linB of B90A (Figure 7B). Here, the copy number difference is suspected to have a more impact, as the two copies of linB might explain the high β-HCH degradation efficacy of HDIPO4 [14, 27]. Apart from these two strains, no such diversity was observed, thus demonstrating the stability of linB gene in the population. linC, which encodes for HCH dehydrogenase was most conserved among the genes of the upper degradation pathway and demonstrated only a single substitution: Y172C in case of IP26. In any case these studies reflect that linA genes are more prone to evolutionary changes under HCH stress and have not stablized yet.
The lower pathway of γ-HCH degradation begins from 2,5-Dichlorohydroquinone (2,5-DCHQ) an intermediate of γ-HCH (Figure 1), which is mineralized by the lower pathway lin genes (linDER, linF, linGHIJ, and linKLMN) . In contrast to the upper degradation pathway, very less is known about the divergence and polymorphisms of the genes of the lower degradation pathway.
In particular linGH, linI and linJ, which mediate the later stages of the lower degradation pathway, i.e., conversion of β-ketoadipate to succinyl CoA and acetyl CoA (Figure 1), showed variation in the sequences of linH and linI, whereas linG and linJ sequences were 100% conserved among all these strains. Here, linH of HDIPO4 and IP26 were similar to each other, and both diverged from B90A with 99.06% identity. They held two substitutions (I31V and N171H) while LL03 shared the I31V substitution and additionally had a N131D substitution. linI was found to be identical in HDIPO4 and IP26, with a single substitution (A188T) and 99.62% identity to B90A, while LL03 had two substitutions (A9T and A185V) and 99.25% identity. However, the significance of sequence divergence in linG, linH, linI, and linJ genes among these strains is yet to be investigated.
Another important lin gene system of the lower pathway is the ABC transporter system i.e., linK, linL, linM, and linN, which encode a permease, ATPase, periplasmic protein, and lipoprotein, respectively. This ABC transpoter system is very important as it allows for the transport of HCH isomers and clearance of dead-end metabolites of HCH from the cell . Out of the entire lin system these genes have shown the highest level of of divergence with linK at 86.8% in RL3, linL at 84.3% in DS20, linM at 84.2% in DS20 and linN at 83.3% in P25. Based on the prevalence of similar but non lin-specific ABC type transporters which are found by sequence identity searches across a variety of microbial species, it is hypothesized that the linKLMN operon derived from convergent evolution in response to environment changes. With the introduction of the HCH to the environment, pre-existing ABC-type transporters were likely recruited to the HCH degradation pathway, and thus several genetic variants might have undergone convergent evolution to select for transporters with increased efficiency for HCH-metabolite efflux, and these later generation genes were the one that subsequently underwent HGT. This is in contrast to the likely origin of the linDER operon, which encodes more highly HCH-specific enzymatic functions and is almost perfectly conserved, and thus was likely generated once, and spread through HGT from a single genetic ancestor. The lin pathway shows a characteristic pattern in which the upper HCH degradation pathway was diverged along a gradient from the most, linA (highly diverged) to the least, linC (least diverged). While in the lower degradation pathway linKLMN was the most highly diverged, followed by linGHIJ, linDER and linF respectively.
Genes under diversifying natural selection
In sequencing the genomes of six novel Sphingobium species and comparing these to the known genomes of three other Sphingobium species, this study has begun to probe the natural variation in the lin pathway for HCH degradation. Analysis of the variation in the lin system, as well as in the phylogenetic relationships, core genomes, and functional profiles of these bacterial strains demonstrated unique characteristics of B90A, HDIPO4 and IP26 which could explain their higher efficacy as the degraders of HCH isomers. The information thus obtained can now be used to select these better-performing strains for the development of a bacterial consortium for on-site bioremediation of the HCH dumpsites. Focusing on the lin system, analysis of the similarities in the lin genes sequences and varying copy numbers between these strains has identified variations in the specific genes as key differentiators and these key components will be of critical interest as the most effective targets for optimization of an enzymatic bioremediation system. The analysis so far made reflect that better linA and linB variants can eventually be the ideal candidates for developing an enzymatic bioremediation system. Moreover, this study has uncovered evidence for genus-level HGT of plasmids housing components of the lin system, specifically between Sphingomonas sp. MM-1 and RL3. The additional lin-deficient strains are of further importance as they demonstrate varying degrees of acquisition of the lin system and will be useful in future homologous recombination studies to work with manipulated pathway completion through introduction of synthetic lin genes.
Selection and sequencing of the Sphingobiumgenomes
Six Sphingobium strains isolated from HCH dumpsites and demonstrating a range of HCH degradation abilities were selected for genome sequencing. Five of these Sphingobium strains i.e. S. lactosutens DS20T, S. chinhatense IP26T, S. ummariense RL3T, S. quisquilarium P25T, and Sphingobium sp. HDIPO4 were isolated from an HCH dumpsite in Chinhat village, Lucknow, India whereas, Sphingobium baderi LL03T, was isolated from an HCH dumpsite in Spolana, Czech Republic. In addition to these six strains, the genomes of an additional three strains: Sphingobium indicum B90A , Sphingobium japonicum UT26S , and Sphingobium sp SYK6 , were included as references in the study.
Genomic DNA was extracted from 5 ml pure culture pellets grown in Luria Bertani at 28°C until O.D. 1.0 or 1.2 using the SuperCos method . DNA concentrations were quantified using NanoDrop spectrophotometer (NanoDrop Technologies Inc, Wilmington, DE, USA). For all the genomes, sequencing was performed using both the Illumina HighSeq 2000 and 454 GS-FLX Titanium platforms. For sequencing, a 2 Kbp paired end sequencing library was constructed, yielding ~100× coverage for each genome. An additional three Sphingobium genomes i.e. Sphingobium japonicum UT26S, S. indicum B90A and Sphingobium sp. SYK6 were retrieved to be used as references for this comparative analysis (Table 1).
Genome assembly, annotation, and functional profiling
Detail statistics of genome assembly of Sphingobium spp.
Organism (Genome status)
No. of Contigs/Chromosomes & Plasmids
N50 (in Kb)
S. baderi LL03 T (Draft)
S. lactosutens DS20 T (Draft)
Sphingobium sp. HDIPO4 (Draft)
S. chinhatense IP26 T (Draft)
S. quisquilarium P25 T (Draft)
S. ummariense RL3 T (Draft)
S. indicum B90A T (Draft)
S. japonicum UT26S T (Draft)
PHRAP and CONSED
Chromosome 1 (3,514,822 bp), chromosome 2 (681,892 bp), pCHQ1 (190,974 bp), pUT1 (31,776 bp) ans pUT2 (5,398 bp)
Sphingobium sp. SYK6
chromosome 1 (4,199,332 bp) and pSLGP (148,801 bp)
For functional profiling, coding sequences were extracted from the RAST server for all the genomes, and orthologous genes were determined using all-versus-all BLASTP at default parameters . This was validated by using CD-HIT  to produce sets of non-redundant representative sequences (query coverage ≥80%, 0.8 sequence identity cut-off). The putative protein coded for each cluster was identified through performing BLASTP on a representative amino acid sequence from each cluster. Comparison of the annotated genomes were also carried out in MicroScope server .
Further, the coding sequences were processed for functional annotation using the bi-directional best-hit (BBH) assignment method on KEGG Automatic Annotation Server (KAAS) . This annotation was then used for biological family construction using protein family prediction on MinPath . The top 50 subsystems were selected based on normalized values obtained by dividing with the lowest value for the genes in the respective pathways. Finally, the nine Sphingobium strains and enriched pathways were clustered heirarchially using Pearson correlation with 0.8% minimum abundance and a heat map was constructed in MeV4.9.0 . Genomic Islands (GIs) were analysed using the IslandViewer software tool (http://www.pathogenomics.sfu.ca/islandviewer) . The CRISPR Finder online server was used to identify CRISPR elements in the draft genomes , which were further analyzed to trace their sources.
Phylogenetic analysis of Sphingobiumspp
16S rRNA gene sequences: which were retrieved using BLASTN (E-value = 10-5) . The 16S rRNA sequences were aligned using CLUSTALX  and subsequently a phylogenetic tree was constructed using the TreeconW software package version 1.3b  with the Jukes & Cantor model (1969)  and Neighbor Joining algorithm (bootstrap value = 1000).
Single Copy Gene Sequences: The amino acid sequences of 28 universally present single copy genes (dnaA, frr, gyrB, infB, mnmA, nusA, pheS, rplB, rplC, rplM, rplS, rpoA, rpsB, rpsC, rpsH, rpsI, rpsJ, rpsS, trmD, tef, ychF, alaS, rplE, uvrC, lepA, rplI, rplP and rplD) were retrieved and concatenated. Further, they were aligned using CLUSTALX (as described above) and their phylogeny was constructed using TreeconW software package version 1.3b with the Poisson correction model and Neighbor Joining algorithm (bootstrap value = 1000).
Tetranucleotide Correlation: Whole-genome based tetranucleotide correlation was performed using TETRA software , based on which a Pearson correlation matrix was constructed. This was followed by hierarchial clustering on the resultant matrix using MeV4.9.0  and finally a dendrogram was constructed in MEGA4 .
Average Nucleotide Identity (ANI): This method includes all possible pairwise comparisons between these genomes as described by Konstantinidis and Tiedje (2005) . Pearson correlation matrices were constructed from these ANI values, which were then used to perform hierarchical clustering followed by a dendrogram construction as described above.
Identification of genes under diversifying natural selection
Orthologs, genes involved in the degradation of aromatic compounds including HCH, and genes for transposable elements were analyzed for positive selection by extracting their sequences and performing codon by codon alignment on CLUSTALX . These dN/dS and dS values, calculated for each gene pair using Hyphy 2.1.2 , were plotted to show the time-independent evolution of the genes.
Arrangement and diversification of the lincatabolic system
The Artemis Comparison Tool (Web-ACT)  was used to compare the arrangement of the lin genes and proximal genetic mobility elements such as IS6100 with reference to lin arrangement in S. japonicum UT26S. After constructing a database of the contigs for each strain, genes for HCH degradation were extracted using BLASTN , and the percent identity to the respective gene in the archetypal strain UT26S was used as a measure of genetic divergence. This divergence was plotted in addition to copy number values for each gene in each strain using the R package ggplot2 . Principle component analysis (PCA) with the copy number and divergence data was then done with the R package FactoMineR , with HCH degradation plotted as a supplementary discrete variable (0- complete non degrader, 1- partial degrader, 2- full degrader), followed by a plot construction with ggplot2. Recruitment plots of the raw reads of the six Sphingobium spp. mapped against the sequence of all the plasmids of Sphingomonas sp MM-1 (extracted from NCBI) were created using MUMmer 3.23 .
Availability of supporting data
The supporting data has been deposited in Dryad (http://datadryad.org/) with doi:10.5061/dryad.g7t27.
The work was supported by Grants from the Department of Biotechnology (DBT), Government of India under project BT/PR3301/BCE/8/875/11, All India Network Project on Soil Biodiversity-Biofertilizer (ICAR), Department of Science and Technology under project SR/SO/AS-24/2011, University of Delhi/Department of Science and Technology, Promotion of University Research and Scientific Excellence (PURSE)-DU-DST—PURSE GRANT. H.V., R.K., P.O., and N.S. gratefully acknowledge the Council for Scientific and Industrial Research (CSIR), the National Bureau of Agriculturally Important Microorganisms (NBAIM) (AMASS/2006–07/NBAIM/CIR) and the Fulbright Program for providing research fellowships.
- Takeuchi M, Hamana K, Hiraishi A: Proposal of the genus Sphingomonas sensustricto and three new genera, Sphingobium, Novosphingobium and Sphingopyxis, on the basis of phylogenetic and chemotaxonomic analyses. Int J Syst Evol Microbiol. 2001, 51: 1405-1417.PubMedView ArticleGoogle Scholar
- Maruyama T, Park HD, Ozawa K, Tanaka Y, Sumino T, Hamana K, Hiraishi A, Kato K: Sphingosinicella microcystinivorans gen. nov., sp. nov., a microcystin-degrading bacterium. Int J Syst Evol Microbiol. 2006, 56: 85-89. 10.1099/ijs.0.63789-0.PubMedView ArticleGoogle Scholar
- Balkwill DL, Fredrickson J, Romine M: From Sphingomonas and Related Genera. The Prokaryotes: A Handbook on the Biology of Bacteria. Volume 7. Edited by: Stackebrandt E. 2006, Singapore: Springer, 605-629.View ArticleGoogle Scholar
- Li YF: Global technical hexachlorocyclohexane usage and its contamination consequences in the environment: from 1948 to 1997. Sci Total Environ. 1999, 232: 121-158. 10.1016/S0048-9697(99)00114-X.View ArticleGoogle Scholar
- Vijgen J, Yi LF, Forter M, Lal R, Weber R: The legacy of lindane and technical HCH production. Organohalog Comp. 2006, 68: 899-904.Google Scholar
- Lal R, Pandey G, Sharma P, Kumari K, Malhotra S, Pandey R, Raina V, Kohler HPE, Holliger C, Jackson C, Oakeshott JG: The biochemistry of microbial degradation of hexachlorocyclohexane (HCH) and prospects for bioremediation. Microbiol Mol Biol Rev. 2010, 74: 58-80. 10.1128/MMBR.00029-09.PubMed CentralPubMedView ArticleGoogle Scholar
- Boltner D, Moreno-Morillas S, Ramos JL: 16S rDNA phylogeny and distribution of lin genes in novel hexachlorocyclohexane-degrading Sphingomonas strains. Environ Microbiol. 2005, 7: 1329-1338. 10.1111/j.1462-5822.2005.00820.x.PubMedView ArticleGoogle Scholar
- Dadhwal M, Singh A, Prakash O, Gupta SK, Kumari K, Sharma P, Jit S, Verma M, Holliger C, Lal R: Proposal of biostimulation for hexachlorocyclohexane (HCH)-decontamination and characterization of culturable bacterial community from high-dose point HCH-contaminated soils. J Appl Microbiol. 2009, 106: 381-392. 10.1111/j.1365-2672.2008.03982.x.PubMedView ArticleGoogle Scholar
- Sangwan N, Lata P, Dwivedi V, Singh A, Niharika N, Kaur J, Anand S, Malhotra J, Jindal S, Nigam A, Lal D, Dua A, Saxena A, Garg N, Verma M, Kaur J, Mukherjee U, Gilbert JA, Dowd SE, Raman R, Khurana P, Khurana JP, Lal R: Comparative metagenomic analysis of soil microbial communities across three hexachlorocyclohexane contamination levels. PLoS ONE. 2012, 7: e46219-10.1371/journal.pone.0046219.PubMed CentralPubMedView ArticleGoogle Scholar
- Sangwan N, Verma H, Kumar R, Negi V, Lax S, Khurana P, Khurana JP, Gilbert JA, Lal R: Reconstructing an ancestral genotype of two hexachlorocyclohexane degrading Sphingobium species using metagenomic sequence data. ISME J. 2013, doi: 10.1038/ismej.2013.153Google Scholar
- Nagata Y, Miyauchi K, Takagi M: Complete analysis of genes and enzymes for gamma-hexachlorocyclohexane degradation in Sphingomonas paucimobilis UT26. J Ind Microbiol Biotechnol. 1999, 23: 380-390. 10.1038/sj.jim.2900736.PubMedView ArticleGoogle Scholar
- Nagata Y, Endo R, Ito M, Ohtsubo Y, Tsuda M: Aerobic degradation of lindane (γ-hexachlorocyclohexane) in bacteria and its biochemical and molecular basis. Appl Microbiol Biotechnol. 2007, 76: 741-752. 10.1007/s00253-007-1066-x.PubMedView ArticleGoogle Scholar
- Sharma P, Pandey R, Kumari K, Pandey G, Jackson CJ, Russell RJ, Oakeshott JG, Lal R: Kinetic and sequence-structure-function analysis of known LinA variants with different hexachlorocyclohexane isomers. PLoS ONE. 2011, 6: e25128-10.1371/journal.pone.0025128.PubMed CentralPubMedView ArticleGoogle Scholar
- Geueke B, Garg N, Ghosh S, Fleischmann T, Holliger C, Lal L, Kohler HPE: Metabolomics of hexachlorocyclohexane (HCH) transformation: ratio of LinA to LinB determines metabolic fate of HCH isomers. Environ Microbiol. 2013, 15: 1040-1049. 10.1111/1462-2920.12009.PubMedView ArticleGoogle Scholar
- Aylward FO, McDonald BR, Adams SM, Valenzuela A, Schmidt RA, Goodwin LA, Woyke TA, Currie CA, Suen G, Poulsen M: Comparison of 26 sphingomonad genomes reveals diverse environmental adaptations and biodegradative capabilities. Appl Environ Microbiol. 2013, 79: 3724-3733. 10.1128/AEM.00518-13.PubMed CentralPubMedView ArticleGoogle Scholar
- Sorek R, Lawrence CM, Wiedenheft B: CRISPR-mediated adaptive immune system in bacteria and archaea. Annu Rev Biochem. 2013, 82: 237-266. 10.1146/annurev-biochem-072911-172315.PubMedView ArticleGoogle Scholar
- Bhaya D, Davison M, Barrangou R: CRISPR-Cas systems in bacteria and archaea: versatile small RNAs for adaptive defense and regulation. Annu Rev Genet. 2011, 45: 273-297. 10.1146/annurev-genet-110410-132430.PubMedView ArticleGoogle Scholar
- Konstantinidis K, Tiedje J: Genomic insights that advance the species definition for prokaryotes. PNAS. 2004, 102: 2567-2572.View ArticleGoogle Scholar
- Nagata Y, Natsui S, Endo R, Ohtsubo Y, Ichikawa N, Ankai A, Oguchi A, Fukui S, Fujita N, Tsuda M: Genomic organization and genomic structural rearrangements of Sphingobium japonicum UT26S, an archetypal γ-hexachlorocyclohexane-degrading bacterium. Enzyme Microb Technol. 2011, 49: 499-508. 10.1016/j.enzmictec.2011.10.005.PubMedView ArticleGoogle Scholar
- Jit S, Dadhwal M, Kumari H, Jindal S, Kaur J, Lata P, Niharika N, Lal D, Garg N, Gupta SK, Sharma P, Bala K, Singh A, Vijgen J, Weber R, Lal R: Evaluation of hexachlorocyclohexane contamination from the last lindane production plant operating in India. Environ Sci Pollut Res Int. 2011, 18: 586-597. 10.1007/s11356-010-0401-4.PubMedView ArticleGoogle Scholar
- Endo R, Ohtsubo Y, Tsuda N, Nagata Y: Identification and characterization of genes encoding a putative ABC-type transporter essential for utilization of gamma-hexachlorocyclohexane in Sphingobium japonicum UT26. J Bacteriol. 2007, 189: 3712-3720. 10.1128/JB.01883-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Ninfa AJ, Magasanik B: Covalent modification of the glnG product, NRI, by the glnL product, NRII, regulates the transcription of the glnALG operon in Escherichia coli. Proc Natl Acad Sci U S A. 1986, 83: 5909-5913. 10.1073/pnas.83.16.5909.PubMed CentralPubMedView ArticleGoogle Scholar
- Scott N, Hess M, Bouskill NJ, Mason OU, Jansson JK, Gilbert JA: The microbial nitrogen cycling potential is impacted by polyaromatic hydrocarbon pollution of marine sediments. Front Microbiol. 2014, 5: doi:10.3389/fmicb.2014.00108Google Scholar
- Ninfa AJ, Ninfa EG, Lupas AN, Srock A, Magasanik B, Stock J: Crosstalk between bacterial chemotaxis signal transduction proteins and regulators of transcription of the Ntr regulon: evidence that nitrogen assimilation and chemotaxis are controlled by a common phosphotransfer mechanism. Proc Natl Acad Sci U S A. 1988, 85: 5492-5496. 10.1073/pnas.85.15.5492.PubMed CentralPubMedView ArticleGoogle Scholar
- Pal R, Bala S, Dhingra G, Prakash O, Dadhwal M, Kumar M, Prabagaran SR, Shivaji S, Cullum J, Holliger C, Lal R: The hexachlorocyclohexane-degrading bacterial strains Sphingomonas paucimobilis B90A, UT26S and Sp+ having similar lin genes are three distinct species, Sphingobium indicum sp. nov; S. japonicum sp. nov; and S. francense sp. nov. and reclassification of [Sphingomonas] chungbukensis as Sphingobium chungbukense comb. nov. Int J Syst Evol Microbiol. 2005, 55: 1965-1972. 10.1099/ijs.0.63201-0.PubMedView ArticleGoogle Scholar
- Kumari H, Gupta SK, Jindal S, Katoch P, Lal R: Description of Sphingobium lactosutens sp. nov., isolated from a hexachlorocyclohexane dump site and Sphingobium abikonense sp. nov. isolated from oil contaminated soil. Int J Syst Evol Microbiol. 2009, 59: 2291-2296. 10.1099/ijs.0.004739-0.PubMedView ArticleGoogle Scholar
- Dadhwal M, Jit S, Kumari H, Lal R: Sphingobium chinhatense sp. nov., a hexachlorocyclohexane (HCH) degrading bacterium isolated from an HCH dump site. Int J Syst Evol Microbiol. 2009, 59: 3140-3144. 10.1099/ijs.0.005553-0.PubMedView ArticleGoogle Scholar
- Singh A, Lal R: A novel hexachlorocyclohexane degrading bacterium Sphingobium ummariense sp. nov. isolated from HCH contaminated soil. Int J Syst Evol Microbiol. 2009, 59: 162-166. 10.1099/ijs.0.65712-0.PubMedView ArticleGoogle Scholar
- Bala K, Sharma P, Lal R: Sphingobium quisquiliarum sp. nov., P25T a hexachlorocyclohexane (HCH) degrading bacterium isolated from HCH contaminated soil. Int J Syst Evol Microbiol. 2010, 60: 429-433. 10.1099/ijs.0.010868-0.PubMedView ArticleGoogle Scholar
- Kaur J, Moskalikova H, Niharika N, Sedlackova M, Hampl A, Damborsky J, Prokop Z, Lal R: Sphingobium baderi sp. nov., isolated from a hexachlorocyclohexane (HCH) dumpsite in Spolana. Int J Syst Evol Microbiol. 2012, 63: 673-678.PubMedView ArticleGoogle Scholar
- Dogra C, Raina V, Pal R, Suar M, Lal S, Gartemann KH, Holliger C, van der Meer JR, Lal R: Organization of lin genes and IS6100 among different strains of hexachlorocyclohexane-degrading Sphingomonas paucimobilis: evidence for horizontal gene transfer. J Bacteriol. 2004, 186: 2225-2235. 10.1128/JB.186.8.2225-2235.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Mohn WW, Mertens B, Neufeld J, De Lorenzo V: Distribution and phylogeny of hexachlorocyclohexane-degrading bacteria in soils from Spain. Environ Microbiol. 2006, 8: 60-68. 10.1111/j.1462-2920.2005.00865.x.PubMedView ArticleGoogle Scholar
- Malhotra S, Sharma P, Kumari H, Singh A, Lal R: Localization of HCH catabolic genes (lin genes) in Sphingobium indicum B90A. Indian J Microbiol. 2007, 47: 271-275. 10.1007/s12088-007-0050-6.PubMed CentralPubMedView ArticleGoogle Scholar
- Gai Z, Wang X, Tang H, Tai C, Tao F, Wu G, Xu P: Genome sequence of Sphingobium yanoikuyae XLDN2-5, an efficient carbazole-degrading strain. J Bacteriol. 2011, 193: 6404-6405. 10.1128/JB.06050-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Tabata M, Ohtsubo Y, Ohhata S, Tsuda M, Nagata Y: Complete genome sequence of the gamma-hexachlorohexane-degrading bacterium Sphingomonas sp. Strain MM-1. Genome Announc. 2013, 1: e00247-13-PubMed CentralPubMedView ArticleGoogle Scholar
- Kumari R, Subudhi S, Suar M, Dhingra G, Raina V, Dogra C, Lal S, Holliger C, van der Meer JR, Lal R: Cloning and characterization of lin genes responsible for the degradation of hexachlorocyclohexane isomers in Sphingomonas paucimobilis strain B90. Appl Environ Microbiol. 2002, 68: 6021-6028. 10.1128/AEM.68.12.6021-6028.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Suar M, van der Meer JR, Lawlor K, Holliger C, Lal R: Dynamics of multiple lin gene expression in Sphingomonas paucimobilis B90A in response to different hexachlorocyclohexane isomers. Appl Environ Microbiol. 2004, 70: 6650-6656. 10.1128/AEM.70.11.6650-6656.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Nagata Y, Mori K, Takagi M, Murzin AG, Damborsky J: Identification of protein fold and catalytic residues of γ-hexachlorocyclohexane dehydrochlorinase LinA. Proteins. 2001, 45: 471-477. 10.1002/prot.10007.PubMedView ArticleGoogle Scholar
- Sharma P, Jindal S, Bala K, Kumari K, Niharika N, Kaur J, Pandey G, Pandey R, Russell RJ, Oakeshott JG, Lal R: Functional screening of enzymes and bacteria for the dechlorination of hexachlorocyclohexane by a high-throughput colorimetric assay. Biodegradation. 2013, 25: 179-187.PubMedView ArticleGoogle Scholar
- Nagata Y, Hatta T, Imai R, Kimbara K, Fukuda M, Yano K, Takagi M: Purification and characterization of γ-hexachlorocyclohexane (γ-HCH) dehydrochlorinase (LinA) from Pseudomonas paucimobilis. Biosci Biotechnol Biochem. 1993, 59: 1582-1583.View ArticleGoogle Scholar
- Ceremonie H, Boubakri H, Mavingui P, Simonet P, Vogel TM: Plasmid-encoded γ-hexachlorocyclohexane degradation genes and insertion sequences in Sphingobium francense (ex-Sphingomonas paucimobilis Sp+). FEMS Microbiol Lett. 2006, 257: 243-252. 10.1111/j.1574-6968.2006.00188.x.PubMedView ArticleGoogle Scholar
- Wu J, Hong Q, Han P, He J, Li S: A gene linB2 responsible for the conversion of β-HCH and 2,3,4,5,6-pentachlorocyclohexanol in Sphingomonas sp. BHC-A. Appl Microbiol Biotechnol. 2007, 73: 1097-1105.PubMedView ArticleGoogle Scholar
- Ito M, Prokop Z, Klvana M, Otsubo Y, Tsuda M, Damborsky J, Nagata Y: Degradation of beta-hexachlorocyclohexane by haloalkane dehalogenase LinB from gamma-hexachlorocyclohexane-utilizing bacterium Sphingobium sp. MI1205. Arch Microbiol. 2007, 188: 313-325. 10.1007/s00203-007-0251-8.PubMedView ArticleGoogle Scholar
- Glavinas H, Krajcsi P, Cserepes J, Sarkadi B: The role of ABC transporters in drug resistance, metabolism and toxicity. Curr Drug Deliv. 2004, 1: 27-42. 10.2174/1567201043480036.PubMedView ArticleGoogle Scholar
- Anand S, Sangwan N, Lata P, Kaur J, Dua A, Singh AK, Verma M, Kaur J, Khurana JP, Khurana P, Mathur S, Lal R: Genome sequence of Sphingobium indicum B90A, a hexachlorocyclohexane-degrading bacterium. J Bacteriol. 2012, 194: 4471-4472. 10.1128/JB.00901-12.PubMed CentralPubMedView ArticleGoogle Scholar
- Nagata Y, Ohtsubo Y, Endo R, Ichikawa N, Ankai A, Oguchi A, Fukui S, Fujita N, Tsuda M: Complete genome sequence of the representative γ-hexachlorocyclohexane-degrading bacterium Sphingobium japonicum UT26S. J Bacteriol. 2010, 192: 5852-5853. 10.1128/JB.00961-10.PubMed CentralPubMedView ArticleGoogle Scholar
- Masai E, Kamimura N, Kasai D, Oguchi A, Ankai A, Fuki S, Fukui S, Takahashi M, Yashiro I, Sasaki H, Harada T, Nakamura S, Katano Y, Narita-Yamada S, Nakazawa H, Hara H, Katayama Y, Fukuda M, Yamazaki S, Fujitab N: Complete genome sequence of Sphingobium sp. strain SYK-6, a degrader of lignin-derived biaryls and monoaryls. J Bacteriol. 2012, 194: 534-535. 10.1128/JB.06254-11.PubMed CentralPubMedView ArticleGoogle Scholar
- Molecular Cloning: A Laboratory Manual. Volume 2. Edited by: Sambrook J, Fritsch EJ, Maniatis T, Maniatis T. 1989, New York: Cold Spring Harbor Laboratory Press, 2Google Scholar
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19: 1117-1123. 10.1101/gr.089532.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralPubMedView ArticleGoogle Scholar
- Kaur J, Verma H, Tripathi C, Khurana JP, Lal R: Draft genome sequence of a hexachlorocyclohexane-degrading bacterium, Sphingobium baderi strain LL03T. Genome Announc. 2013, 1: e00751-13-PubMed CentralPubMedGoogle Scholar
- Kumar R, Dwivedi V, Negi V, Khurana JP, Lal R: Draft genome sequence of Sphingobium lactosutens strain DS20 isolated from an hexachlorocyclohexane (HCH) dumpsite. Genome Announc. 2013, 1: 00753-13-Google Scholar
- Niharika N, Sangwan N, Ahmad S, Singh P, Khurana JP, Lal R: Draft genome sequence of Sphingobium chinhatense strain IP26T isolated from the hexachlorocyclohexane dumpsite. Genome Announc. 2013, 1: 00680-13-View ArticleGoogle Scholar
- Singh AK, Sangwan N, Sharma A, Gupta V, Khurana JP, Lal R: Draft genome sequence of Sphingobium quisquiliarum P25T, a novel hexachlorocylohexane (HCH)- degrading bacterium isolated from the HCH dumpsite. Genome Announc. 2013, 1: 00717-13-Google Scholar
- Kohi P, Dua A, Sangwan N, Oldach P, Khurana JP, Lal R: Draft genome sequence of Sphingobium ummariense strain RL-3, a hexachlorocyclohexane-degrading bacterium. Genome Announc. 2013, 1: 00956-13-Google Scholar
- Mukherjee U, Kumar R, Mahato NK, Khurana JP, Lal R: Draft genome sequence of Sphingobium sp. HDIPO4, an avid degrader of hexachlorocyclohexane. Genome Announc. 2013, 1: 00749-13-View ArticleGoogle Scholar
- Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23: 673-679. 10.1093/bioinformatics/btm009.PubMed CentralPubMedView ArticleGoogle Scholar
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RAST server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75.PubMed CentralPubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.PubMedView ArticleGoogle Scholar
- Vallenet D, Belda E, Calteau A, Cruveiller S, Engelen S, Lajus A, Fèvre FL, Longin C, Mornico D, Roche D, Rouy Z, Salvignol G, Scarpelli C, Smith AAT, Weiman M, Médigue C: MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res. 2013, 41: D636-D647. 10.1093/nar/gks1194.PubMed CentralPubMedView ArticleGoogle Scholar
- Moriya Y, Itoh M, Okuda S, Yoshizawa A, Kanehisa M: KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007, 35: W182-W185. 10.1093/nar/gkm321.PubMed CentralPubMedView ArticleGoogle Scholar
- Ye Y, Doak TG: A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol. 2009, 5: e1000465-10.1371/journal.pcbi.1000465.PubMed CentralPubMedView ArticleGoogle Scholar
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34: 374-378.PubMedGoogle Scholar
- Langille MGI, Brinkman FSL: IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics. 2009, 25: 664-665. 10.1093/bioinformatics/btp030.PubMed CentralPubMedView ArticleGoogle Scholar
- Grissa I, Vergnaud G, Pourcel C: CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007, 35: W52-W57. 10.1093/nar/gkm360.PubMed CentralPubMedView ArticleGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Van de Peer Y, De Wachter Y: TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput Applic Biosci. 1994, 10: 569-570.Google Scholar
- Jukes TH, Cantor CR: Evolution of protein molecules,” in Munro (ed.). Mammalian Protein Metabolism. 1969, 3: 21-132.View ArticleGoogle Scholar
- Teeling H, Waldmann J, Lombardot T, Bauer M, Glöckner FO: TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004, 5: 163-10.1186/1471-2105-5-163.PubMed CentralPubMedView ArticleGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Bioland Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.View ArticleGoogle Scholar
- Pond SLK, Frost SDW, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005, 21: 676-679. 10.1093/bioinformatics/bti079.PubMedView ArticleGoogle Scholar
- Carver TJ, Rutherford KM, Berriman M, Rajandream MA, Barrell BG, Parkhill J: ACT: the artemis comparison tool. Bioinformatics. 2005, 16: 3422-3433.View ArticleGoogle Scholar
- Wickham H: Ggplot2: Elegant Graphics for Data Analysis. 2009, New York: Springer, 213-View ArticleGoogle Scholar
- Le S, Josse J, Husson F: FactoMineR: an R package for multivariate analysis. J Stat Softw. 2008, 25: 1-18.View ArticleGoogle Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol. 2004, 5: 12-View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.