Metagenomic islands of hyperhalophiles: the case of Salinibacter ruber
© Pašić et al. 2009
Received: 12 June 2009
Accepted: 1 December 2009
Published: 1 December 2009
Skip to main content
© Pašić et al. 2009
Received: 12 June 2009
Accepted: 1 December 2009
Published: 1 December 2009
Saturated brines are extreme environments of low diversity. Salinibacter ruber is the only bacterium that inhabits this environment in significant numbers. In order to establish the extent of genetic diversity in natural populations of this microbe, the genomic sequence of reference strain DSM 13855 was compared to metagenomic fragments recovered from climax saltern crystallizers and obtained with 454 sequencing technology. This kind of analysis reveals the presence of metagenomic islands, i.e. highly variable regions among the different lineages in the population.
Three regions of the sequenced isolate were scarcely represented in the metagenome thus appearing to vary among co-occurring S. ruber cells. These metagenomic islands showed evidence of extensive genomic corruption with atypically low GC content, low coding density, high numbers of pseudogenes and short hypothetical proteins. A detailed analysis of island gene content showed that the genes in metagenomic island 1 code for cell surface polysaccharides. The strain-specific genes of metagenomic island 2 were found to be involved in biosynthesis of cell wall polysaccharide components. Finally, metagenomic island 3 was rich in DNA related enzymes.
The genomic organisation of S. ruber variable genomic regions showed a number of convergences with genomic islands of marine microbes studied, being largely involved in variable cell surface traits. This variation at the level of cell envelopes in an environment devoid of grazing pressure probably reflects a global strategy of bacteria to escape phage predation.
Prokaryotic genomes are extraordinarily plastic entities and vary widely within the limits of a well defined species. In order to describe these large genetic reservoirs the pan-genome concept was introduced . According to this concept, the species genome is composed of a core genome, containing genes present in all (or most) strains and a variable genome, containing genes present only in some strains.
In some cases, this variation is concentrated in hypervariable sets of genes, known as genomic islands [2–4]. Genomic island genes are often involved in specific lifestyles [5, 6], e.g. symbiosis or pathogenesis [7, 8] and frequently have the hallmarks of horizontally transferred genetic material such as different GC content or codon usage [9, 10]. However, very little is known about the dynamic processes that originate and maintain the large genomic variability found in closely related prokaryotic genomes.
Metagenomics provides a new way to look at the dynamics and flexibility of prokaryotic genomes in nature [3, 11–13]. When a microbe is well represented in an environment, and a metagenomic database from the same or a similar environment is available, it is possible to analyze genome recruitment - the preservation of genomic sequences in the natural environment. Using this approach, several authors working in different kinds of aquatic environments have found that some regions of sequenced genomes are poorly or not at all represented in the environment even when large stretches of the genome are nearly 100% similar to fragments from the metagenome [3, 11–14]. In accordance with previous studies mentioned above, these genome stretches have been identified as genomic islands. However, this nomenclature is somewhat misleading. Although these islands often do correspond to classical genomic islands, identified through comparison of closely related prokaryotic genomes, there is not always a complete overlap [3, 15]. Thus, although the latter often lack representation in the metagenome this is not always the case and vice-versa. In order to distinguish between these subtypes, we propose the term metagenomic island (MGI) to describe genome stretches identified by tiling of metagenomic reads against a reference strain genome.
To understand the mechanisms that generate the variability reflected by MGIs and their potential adaptive value [5, 6], the gene content of metagenomic islands in different prokaryotic species needs to be explored. Microbial communities of extreme environments are especially appealing for this type of analysis. As a rule of thumb, these systems support low microbial diversity to the point of being dominated by few types of organisms with tightly defined population structure . A typical example of extremely simplified microbial communities can be found in terminal pans of solar salterns where microorganisms endure saturated concentrations of NaCl. Known as crystallizers, these pans support very specialized hyperhalophilic archaea and bacteria [16, 17]. The latter have been shown to be represented almost exclusively by S. ruber [17, 18]. This hyperhalophilic member of CFB group is repeatedly reported in significant numbers from distinct hypersaline habitats around the world . Comparative analysis of available 16S rRNA gene sequences indicated that S. ruber strains genetically differ and can be classified into at least two distinct phylotypes . Here, we report for the first time the delimitation and a detailed description of S. ruber MGIs as seen by comparing type strain DSM 13855 genome with the metagenome of a solar saltern crystallizer. The results of this study display similarities with previously described metagenomic islands of another crystallizer species - the archaeon Haloquadratum walsbyi DSM 16790 as well as with MGIs of marine bacteria.
The metagenomic library used in this study was generated from environmental DNA obtained from crystallizer ponds of Chula Vista salterns, near San Diego, California on a GS20 sequencing platform. In total, 618127 reads were analyzed. The average read length was 100 bp. Using E>1e-5 BLASTX identity thresholds against the nr database we were able to phylogenetically assign approximately 10% of obtained reads. Several haloarchaeal species were found to be abundant and represented over 80% of assigned dataset. These were H. walsbyi (23% of reads), Haloarcula marismortui (23% of reads), Natronomonas pharaonis (20% of reads) and Halorubrum lacusprofundi (22% of reads). Bacteria were represented almost exclusively by S. ruber (12% of reads). The second set of metagenomic sequences (2974 sequences) was available from Legault et al. (2006) . These authors used Sanger sequencing to end sequence a 2000 clone fosmid library constructed from samples of crystallizer brine of salterns in Santa Pola, Spain. The simple microbial community encountered in previous studies carried out here was composed of H. walsbyi (>80% of cells) and S. ruber (up to 20% of cells) .
Metagenomic reads of both datasets were tiled against available genomes in genome recruitment analysis using MUMmer. As expected, marine organisms and moderate halophiles did not recruit in metagenomes. In comparisons involving the Chula Vista salterns metagenome, significant recruitment was observed with genomic sequences of S. ruber and H. walsbyi. In consistence with results obtained by BLASTX analysis, over 10% of the dataset could be mapped to genome of S. ruber DSM 13855. The latter recruited a total of 90477 fragments (14.6% of entire dataset) out of which 17120 fragments were at 100% sequence identity. The genome of H. walsbyi DSM 16790 recruited a total of 56985 fragments out of which 11764 fragments gave hits at 100% sequence identity. This data confirmed the predominant role of these organisms in such hypersaline environments. The recruitment of remaining halophilic microorganism genomes in San Diego salterns was mostly moderate. Genomic sequence of Halobacterium salinarum R1, found scarce in BLASTX analysis, recruited 26135 reads with no recruitment observed above 97.5% sequence identity. Recruitment of presumably abundant species was only moderate. Genomic sequence of H. marismortui ATCC 43049 recruited 32416 fragments (1070 at 100% sequence identity), H. lacusprofundi ATCC 49239 recruited 41487 fragments (1646 at 100% sequence identity) and N. pharaonis DSM 2160 recruited 34933 fragments (1334 at 100% sequence identity). Together with BLASTX results these findings indicate that the sequenced members of the above genera are not well represented in this specific environment although some unknown relatives must be present. Not surprisingly, the above genomes originate from hypersaline environments other than salterns  namely the Dead Sea (H. marismortui), Antarctic Deep Lake (H. lacusprofundi), and highly saline soda lakes in Egypt and Kenya (N. pharaonis), while genomic sequences of highly recruiting H. walsbyi DSM 16790 and S. ruber DSM 13855 were determined from strains originally isolated from Spanish Mediterranean salterns [20–23].
Next, the same set of genomes was compared to Santa Pola dataset. No recruitment was observed with S. ruber DSM 13855 since the biomass collection procedure applied (filtration onto 2 μm pore size filters, see Methods) prevented collection of significant amounts of this microbe. In fact, genomic recruitment was observed only with H. walsbyi DSM 16790 as described and discussed before [11, 12]. It is worth mentioning that the observed island pattern was very similar with both datasets (Additional file 1). These results indicate that metagenomic islands are a feature conserved within species regardless of geographic origin of the genomic sequence or metagenomic dataset. Furthermore, the phenomenon seems to be unaffected by the sequencing effort (within the ranges described here) or sequencing technique used.
It is worth mentioning that the MGI 2 genes are preceded by rfbBACD - the genes involved in biosynthesis of dTDP-L-rhamnose, another component of O-polysaccharide repeat unit  and further upstream (cca. nucleotide 730000) by large clusters (mur, fts) involved in peptidoglycan synthesis. Due to the region hypervariability we hypothesize that the genes constituting MGI 2 are lineage dependent and perhaps unique to DSM 13855. In contrast to hypervariable MGI 2, the upstream peptidoglycan synthesis genes are well preserved in the metagenome and thus perhaps in all lineages of S. ruber. Although we could not find evidence of genes involved in the synthesis of the core lipopolysaccharide, the similarities shared between MGI 2 of S. ruber and O-polysaccharide gene clusters of other Gram negative bacteria indicate that MGI 2 genes could be involved in biosynthesis of extracellular polysaccharide component of the cell wall. We further hypothesize that this polysaccharide could be exposed on the outer surface of S. ruber cell wall.
Comparative analysis of hypervariable regions detected in this analysis was performed using genomes available from GenBank ftp://ftp.ncbi.nih.gov/genomes/ and metagenomic datasets available from this study and Camera database http://camera.calit2.net/index.php. In this analysis the metagenomic islands of S. ruber DSM 13855 showed several convergences with metagenomic islands of other microbes studied, in particular high numbers of hypothetical and conserved hypothetical proteins, transposases, integrases and transport-related proteins.
A metagenomic island enriched in products involved in restriction/modification and DNA repair was a feature shared by MGIs of S. ruber, H. walsbyi , Prochlorococcus marinus , Candidatus Accumulibacter phosphatis  and Ferroplasma acidarmanus . These MGIs are often associated with phage-type integrase genes and might have developed as a result of prophage insertion.
The presence of metagenomic islands putatively involved in biosynthesis of polysaccharide component of cell wall was a feature shared by MGIs of S. ruber and most Gram negative aquatic microbes such as P. marinus , Candidatus Pelagibacter ubique , Synechococcus sp. WH8102 and Synechococcus sp.CC9311 . In addition, presence of variable genes involved in extracellular polysaccharide biosynthesis was reported from Candidatus Accumulibacter phosphatis  and Ferroplasma acidarmanus . Interestingly, recruitment studies of H. walsbyi [12, 14], an archaeon with glycoprotein S-layer based cell wall, showed the presence of at least two MGIs putatively involved in the synthesis of the cell wall.
One of the most effective ways to study genomic plasticity in prokaryotes is to compare metagenomic data to the genomes of strains present in the environment studied [3, 11–13, 32–34]. In this study, this approach was applied to an extreme hypersaline environment, the brine of a solar saltern. Good recruitment properties were only observed when genomic sequences of strains isolated from a similar environment were compared to the metagenome. In this particular case the strains recruiting efficiently were isolated from other geographically solar salterns. In all cases, representative genomes possessed a typical recruiting pattern with metagenomic islands as their most remarkable feature.
It seems to be a general phenomenon of many, if not most, bacteria that a large part of the gene cluster coding for the polysaccharide component of cell wall is extremely variable. In clinical isolates, this phenomenon has been known for many years, more than 180 lipopolysaccharide serotypes have been described in Escherichia coli and more than 50 in Salmonella enterica . As mentioned above, the presence of genes involved in the synthesis of the polysaccharide component of cell wall was a feature shared by variable regions of S. ruber, P. marinus, Candidatus Pelagibacter ubique and Candidatus Accumulibacter phosphatis. In Candidatus Accumulibacter phosphatis sludge bioreactors the variation in dominant lineages was noted not only in the exopolysaccharide synthesis cluster genes but also in clustered regularly interspaced short palindromic repeat (CRISPR) elements . These elements, regularly interspaced by foreign DNA sequences, can provide immunity to the phages from which they were derived . However, this strategy appears less widespread in brines since we were not able to identify any CRISPR in genome of S. ruber while H. walsbyi genome contained only one such element. Likewise, these elements were scarce in the metagenomes studied.
The extreme environment of solar saltern crystallizer supports dense yet simple microbial communities composed of highly related strains of dominant species . Such environments do not host phagotrophic protists, remain free from grazing pressure and are natural targets for phage predation [37, 38]. We hypothesise that cell wall polysaccharide variability supplied by metagenomic islands could play a role in defence against this predation. In the past, phages have been shown to target lipopolysaccharide through their host recognition machineries  or strain-specific polysaccharases . In the specific case of S. ruber, several components of MGI 1 and particularly MGI 2 indicate this type of strategy. They include genes involved in biosynthesis of colanic acid, shown to be hydrolysed by phage induced enzymes in Escherichia coli , and sialic acid biosynthesis genes, reported to be a part of phage receptors . In densely populated aquatic habitats such genes will be subject to arm races (also known as Red Queen strategies), and be required to be as plastic as their bacteriophage counterparts to maintain a reasonable population density and avoid catastrophic crashes of the population due to phage lysis. This hypothesis is supported by results showing high expression of metagenomic island genes suggesting that they encode proteins central to cellular processes in specific genotypes . In order to achieve the desired level of genome plasticity as least two mechanisms could be employed. Metagenomic islands are transposase rich areas in which genes often share homology with multiple phylogenetically diverse microbes and thus might act as lateral gene transfer hot spots in order to achieve the observed level of genome plasticity. Additional diversification through lateral gene transfer and recombination could be achieved through modular organisation of cell wall polysaccharide biosynthesis genes. This was observed in genome of S. ruber where a lineage-specific set of genes, located within the metagenomic island, is preceded by rfb gene cluster involved in rhamnose biosynthesis and further upstream by mur and fts clusters involved in peptidoglycan synthesis. This phenomenon has been noted in at least one another species. In Streptococcus thermophilus, a Gram positive species and therefore devoid of lipopolysaccharide, the exocellular polysaccharide biosynthesis cluster is composed of core gene cluster, represented by deoD-epsABCD, and followed by a variable region . Interestingly, similar to crystallizer brine, the natural environment of Streptococcus thermophilus also supports dense microbial communities with low microbial diversity that is devoid of protists grazing.
Tiling the genomic sequence of S. ruber DSM 13855 against reads from the San Diego saltern crystallizer metagenome has shown that the conserved backbone of this genome is well represented in the metagenomic data. This result is quite remarkable because this isolate comes from a Mediterranean solar saltern. However, like other microbial genomes when compared to a metagenome in which they are well-represented the tiling of the genome leaves empty regions of low coverage or metagenomic islands.
Metagenomic islands share several features with classical genomic islands described by comparing genomes of closely related strains such as atypical GC content, high frequency of phage/IS elements and hypothetical genes. However, their gene content appears largely involved in biosynthesis of cell wall polysaccharides. This phenomenon appears to be general in this and other marine microbes studied and might reflect a global strategy of bacteria to escape phage predation .
The environmental genomic sequences collected from Santa Pola solar salterns (Alicante, Spain) were obtained in a previous study as described in . The DNA was extracted from biomass retained on a 2 μm pore size filter. A 2000 clone fosmid library was end sequenced resulting in 2947 available sequences.
The environmental genomic sequences collected from Chula Vista solar salterns (Chula Vista, CA), were obtained from biomass retained on a 0.2 μm pore size tangential flow filter and were sequenced by pyrosequencing on a GS20 sequencing platform (454 Life Sciences, CT, USA). A total of 618127 reads of average length of 100 bp were obtained.
The raw metagenomic sequence obtained from Chula Vista solar salterns was screened to remove low quality and short sequences. To this aim the software The Hairdresser was developed (see Availability and requirements section below). To this aim the software The Hairdresser was developed (see Availability and requirements section below). Using the multifasta metagenomic sequence file as input variable, the software enables removal of sequences of desired length from metagenomic sequence file using the ShortCut function, removal of desired subset of the metagenomic sequence file using the ClipOut function, renames sequences using the ReStyle function and calculates thermostability index of the metagenomic sequence file entries using the HotComb function.
A total of 2947 sequences available from Santa Pola solar salterns and 618127 reads available from San Diego solar salterns were aligned against reference genomes by using the MUMmer program version 3.19 . Specifically, to calculate alignments 'PROmer' program with the 'maxmatch' option was used. The percent identity plots were generated using 'mummerplot'.
For BLAST-based recruitment analysis, the genome was split into fragments of 50 nucleotides in length and compared to the metagenome using basic local alignment search tool BLASTN (DNA vs. protein) . The plot was generated by counting the number of hits to each fragment versus position on the chromosome.
Island genes were re-annotated to ensure no open reading frame (ORF) was missed. Protein coded genes were predicted using the annotation package GLIMMER , and were further manually curated. Spacers were subsequently searched against the non-redundant database using BLAST . ORFs were compared to known proteins in the non-redundant database using the BLASTX program (translated DNA vs. protein). All hits with E-value greater than 10-5 were considered non-significant.
Additional BLASTN, BLASTP and PSI-BLAST searches were performed when needed. All hits with E-value greater than 10-5 were considered non-significant. COG classification of S. ruber DSM 13855 genomic sequences was obtained from GenBank. COG classification of metagenomic sequence reads was performed by conducting rps-blast search against the COG database. Significant sequences were distributed in COG categories. KEGG pathway analysis was available from http://www.genome.jp/kegg/pathway.html. GC content was identified using the 'geecee' program from EMBOSS package . GC plots were generated using 'insilico' web server http://insilico.ehu.es. Protein topology predictions were performed using SOSUI, PredictProtein and HMMTOP available from Expasy proteomics server http://www.expasy.ch/. Conserved blocks in groups of unaligned protein sequences were identified by using the Block Maker program http://blocks.fhcrc.org/blockmkr/make_blocks.html. CRISPR analysis was performed using CRISPR finder available from http://crispr.u-psud.fr/crispr/CRISPRdatabase.php?page=own. Genes were identified as pseudogenes when they showed similarity to a sequence classified as a gene in another species (E < 1e-20) but in which frameshift and substitution mutations to stop codons have started to accumulate .
The sequence of the complete genome of Haloquadratum walsbyi DSM 16790 was deposited as [GenBank:AM180088.1, GenBank:AM180089.1], the sequence of the complete genome of Salinibacter ruber DSM 13855 was deposited as [GenBank:NC_007677, GenBank:NC_007678], the metagenomic sequences of Santa Pola salterns were deposited as [GenBank:DU826964-DU824018] and the metagenomic sequences of San Diego solar salterns were available through http://scums.sdsu.edu/.
The Hairdresser software requires the Microsoft Windows Vista or XP operating systems. The program was written with Borland Delphi 7 Enterprise and the executable file, source code and example files are available as Additional File 3 and at the following open-source repository: http://hairdresser.sourceforge.net/.
LP is supported by EMBO ASTF366-2007 and ARRS (Slovenia) research programme P1-0198. BRB and FR are supported by the US National Science Foundation (DEB-BE04-21955) and Gordon and Betty Moore Foundation. Work at FR-V laboratory is supported by Grant BIO2008-02444 and ABMC is supported by 'Juan de la Cierva' program both from Ministerio de Ciencia e Innovación of Spain.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.