Comparative metagenomics ofDaphniasymbionts

  • Weihong Qi1, 4,

    Affiliated with

    • Guang Nong2,

      Affiliated with

      • James F Preston2,

        Affiliated with

        • Frida Ben-Ami3 and

          Affiliated with

          • Dieter Ebert3Email author

            Affiliated with

            BMC Genomics200910:172

            DOI: 10.1186/1471-2164-10-172

            Received: 04 March 2008

            Accepted: 21 April 2009

            Published: 21 April 2009

            Abstract

            Background

            Shotgun sequences of DNA extracts from whole organisms allow a comprehensive assessment of possible symbionts. The current project makes use of four shotgun datasets from three species of the planktonic freshwater crustaceansDaphnia: one dataset from clones ofD. pulexandD. pulicariaand two datasets from one clone ofD. magna. We analyzed these datasets with three aims: First, we search for bacterial symbionts, which are present in all three species. Second, we search for evidence for Cyanobacteria and plastids, which had been suggested to occur as symbionts in a relatedDaphniaspecies. Third, we compare the metacommunities revealed by two different 454 pyrosequencing methods (GS 20 and GS FLX).

            Results

            In all datasets we found evidence for a large number of bacteria belonging to diverse taxa. The vast majority of these were Proteobacteria. Of those, most sequences were assigned to different genera of the Betaproteobacteria family Comamonadaceae. Other taxa represented in all datasets included the generaFlavobacterium, Rhodobacter, Chromobacterium, Methylibium, Bordetella, BurkholderiaandCupriavidus. A few taxa matched sequences only from theD. pulexand theD. pulicariadatasets:Aeromonas, PseudomonasandDelftia. Taxa with many hits specific to a single dataset were rare. For most of the identified taxa earlier studies reported the finding of related taxa in aquatic environmental samples. We found no clear evidence for the presence of symbiotic Cyanobacteria or plastids. The apparent similarity of the symbiont communities of the threeDaphniaspecies breaks down on a species and strain level. Communities have a similar composition at a higher taxonomic level, but the actual sequences found are divergent. The twoDaphnia magnadatasets obtained from two different pyrosequencing platforms revealed rather similar results.

            Conclusion

            Three clones from three species of the genusDaphniawere found to harbor a rich community of symbionts. These communities are similar at the genus and higher taxonomic level, but are composed of different species. The similarity of these three symbiont communities hints that some of these associations may be stable in the long-term.

            Background

            Metagenomics is the field that infers the properties of a habitat through the analysis of genomic sequence information obtained from a sample usually collected from a single habitat. The sequences are usually compared to databases, with the aim to characterize the biological community of this habitat. Among the advantages of this explorative method are the free and uncomplicated sampling of the material, the possibility of obtaining sequences from unknown and unculturable organisms, the absence of any taxonomic restrictions and the relative ease of conducting such studies [14]. Metagenomics studies have been done in various habitats, including sea water [5], ice cores [6] and deep mine communities [7]. Of particular recent interest has been the application of metagenomic approaches to study samples obtained from organisms, which harbor various symbionts, such as unknown and uncultuable bacteria, protozoa or viruses. For example, the symbiont communities of honey bees [8], the guts of mice [9] and humans [10], marine sponges [11], oligochaetes [12] and plant-rhizobacteria [13] revealed many new symbiont taxa. However, not only samples collected with the aim to find symbionts revealed previously unknown organisms, but also datasets from genome projects where one single genome was targeted may contain sequences of other species, presumably symbionts [14]. Here we report on the bacterial communities associated with three clones each from one species of crustaceans of the genusDaphnia, which had been used in genome projects and revealed besides sequences to the targeted species, a rich body of sequences to other species. We use the term symbiont to include organisms that were found to be associated with the samples of theseDaphnia, disregarding whether they are parasites, commensals or mutualists. We cannot rule out, that some of these organisms are independent of theDaphnia, e.g. free living bacteria in the water, parts of the ingested food or contaminants from handling the samples. For simplicity we use the term symbiont throughout this article.

            Daphniais a genus of small freshwater plankton living in standing freshwater bodies. Their body sizes ranges from 0.3 to 5 mm. They are primary consumers in the aquatic food chain and their ecology and evolution has been intensively studied [15]. Numerous ecto- and endo-parasites have been described [16,17], but the non-parasitic bacterial symbionts ofDaphniaare very poorly known. Electron micrographs typically reveal large numbers of bacteria associated withDaphnia, as is illustrated with the examples in Figure1. The entire body ofDaphniacan be coated in thick bacterial mats [16,17]. Thus,Daphniaare likely to carry a community of prokaryotes with them. Only one case of a possible mutualist has been reported so far. Chang and Jenkins [18] reported the presence of photosynthetically active gut endosymbionts inDaphnia obtusa. They speculate that theDaphniatake up plastids via phagocytosis, after the lysis of the mother cell in the gut. Variations in ultrastructure lead them to assume that plastids from different sources are taken up, including Cyanobacteria. These findings have not been confirmed for any otherDaphniaspecies, although the ecological niches ofDaphniaspecies are often strongly overlapping.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig1_HTML.jpg
            Figure 1

            Four examples of scanning electron microscopic (SEM) images of parts ofD. magnashowing numerous bacteria attached to different surface structures. A. Head ofD. magna. The white filamentous structures on the surface are bacteria. B and C. Surface of the carapace with bacteria attached. The thin lines on the carapace denote epidermis cell boundaries. D. Parts of the filter apparatus ofD. magna. The oval objects are bacteria. None of the bacteria have yet been identified. Scale bar 200 μm in A and 10 μm in B, C and D.

            Here we take advantage of shotgun sequences obtained from three laboratory clones (= iso-female lines) each from oneDaphniaspecies to search for indications of bacterial and plastid symbionts. For this we compared the sequences against the NCBI-nt database on nucleotide sequences using BLASTN [19] and analyzed and ordered the results using the metagenomics software MEGAN [20]. This software allows the exploration of the taxonomic content of a community sample based on the NCBI taxonomy. Community shotgun datasets represent sequences independently sampled from random regions of genomes randomly selected from a given community. These sequences can have very different levels of conservation. Without any assumptions about the functions of the sequences used, MEGAN associates each sequence to the lowest common ancestor of the set of taxa it hits. Thus, species specific sequences are assigned to low order taxa such as species or strains, while widely conserved sequences are assigned to high-order taxa. In other words, the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The strength of this statistical approach is that it makes use of all kind of sequences for taxon identification. Therefore, when using random sequences MEGAN, will usually show better taxonomic resolution than an analysis using only a small set of phylogenetic markers [20]. This type of analysis is in particular useful when, as is the case here, datasets are analyzed, which were obtained by random shotgun sequencing, rather than targeted sequencing (see also [21]) and where the length of the sequence reads are short [20,22].

            Our choice to use the software MEGAN for the analysis of the datasets from theDaphniaprojects is based on several aspects, which help to reduce known problems in comparative metagenomics. A known shortcoming of the assignment of sequences to taxonomic groups is its inability to deal with horizontally transferred genes and the inability of mapping sequences to internal nodes of the tree [23]. However, these problems are mainly of concern when using "best-BLAST-hit" mapping. The software MEGAN was developed to avoid this problem (see previous paragraph). A further problem of assigning sequences to taxonomic groups is the well know bias in the taxon representation in our databases [24,25]. This problem cannot be fully solved, but the ability of MEGAN to assign sequence to the lowest common ancestor, ameliorates the consequences of a database bias. Sequences will be assigned to the common ancestor of the true species in question and those being represented in the database. Novel sequences will not be assigned at all [20].

            The aims of our analysis were first to compare the shotgun sequences of the prokaryote communities coming from threeDaphniaspecies. Second to test if the shotgun sequences give evidence for a plastid symbiont inDaphniaas had been suggested [18]. Third, to estimate the repeatability of a metagenomics approach using two different sequencing platforms, the pyrosequencers GS 20 and GS FLX [26] for one of the threeDaphniaspecies.

            Results and discussion

            In the four datasets, sequences that were assigned to known cellular organisms varied from 9% to 18% (Table1). The vast majority of the assigned sequences were to Eukaryota and to Bacteria. Few sequences were assigned to the NCBI Taxonomy categories: Archaea, Viroids, Other and Unclassified. Only among theD. pulicariasequences were hits (a total of 4) found to viruses. However, the low bit scores suggest that these may have other origins. As the scaffolds ofD. pulexincluded in this study had been presorted to include only bacteria, there might have been more hits to taxa other than Bacteria and Eukaryota.
            Table 1

            Number of sequences assigned and unassigned in the MEGAN analysis.

            Daphniaspecies/dataset

            Assigned to cellular organisms

            Assigned to Bacteria without Firmicutes1

            Not assigned2

            Sequences without hits

            D. pulex

            38,249

            25,868

            97,852

            120,355

            D. pulicaria

            99,178

            25,604

            966,027

            23,469

            D. magnaGS 20

            3,028

            2,560

            16,007

            26

            D. magnaGS FLX

            4,781

            4,285

            21,535

            12

            1The Firmicutes were excluded, because theD. magnadatasets contained a bacterial parasite belonging into this taxon. For each dataset, the sum of columns 2, 4, and 5 is less than the total number of sequences analyzed (Table 2) due to the few sequences assigned to other NCBI taxonomy categories such as "Other" and "Unclassified".

            2The unassigned sequences are sequences without hits above the defined thresholds (See Materials and Methods). They may be A) sequences that do not have homologs in the current NCBI-nt database, B) sequences that evolved so strongly that their homologs are disguised by bit scores below our threshold or C) sequences that are assigned to species to which no other sequences is assigned (min-support threshold = 2).

            The numbers of bacterial genera (excluding the Firmicutes) with at least two reads assigned were 90, 123, 37 and 51 for theD. pulex, D. pulicaria, D. magnaGS 20 andD. magnaGS FLX datasets, respectively. The lower number of genera revealed by theD. magnadatasets corresponds with the smaller size of these datasets (Table1, Table2). This large number of genera indicates a rich community of bacteria in and onDaphnia. In all datasets the majority of the sequences were assigned to the Gamma- and Betaproteobacteria (Fig.2), which together accounted for more than 87% of the sequences assigned to bacteria. Outside the Proteobacteria, the Bacteroidetes and to a lesser degree to the Actinobacteria were found, the later however, mainly in theD. pulicariadataset. Except the Actinobacteria, all taxa with substantial number of sequences assigned to were found in datasets from all threeDaphniaspecies.
            Table 2

            Summary of the four datasets included in this analysis.

             

            D. pulex

            D. pulicaria

            D. magnaGS 20

            D. magnaGS FLX

            Original input data:

                

            Data type

            Possible bacterial scaffolds

            Contigs and raw reads longer than 500 bps

            Contigs longer than 100 bps

            Contigs longer than 100 bps

            No. of original sequences

            21,646

            327,632

            4,388

            6,696

            Total length (bps)

            59,379,440

            323,393,910

            4,335,734

            6,154,579

            Average length (mean ± stdev bps)

            2,743 ± 7,205

            987 ± 255

            988 ± 2,830

            919 ± 2,507

            Median length (bps)

            975

            993

            218

            280

            Minimum length (bps)

            10

            500

            100

            100

            Maximum length (bps)

            216,125

            9,681

            40,374

            40,088

            Sequence fragments subjected to BLASTN:

                

            No. fragments

            256,498

            1,088,697

            19,163

            26,430

            Total length (bps)

            133,734,869

            570,776,073

            8,809,340

            12,259,583

            Average length (mean ± stdev bps)

            521 ± 100

            524 ± 195

            459 ± 149

            463 ± 131

            Median length (bps)

            500

            500

            500

            500

            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig2_HTML.jpg
            Figure 2

            The comparative taxonomic tree of the bacterial orders found in the threeDaphniadatasets. The data of the twoD. magnadatasets were combined for this figure. Only bacterial orders, with at least 2 sequences assigned are included. The Firmicutes were excluded (see text for explanation). The numbers next to the taxon names are the cumulative number of sequences assigned to this taxon. The size of the circles is proportional to the number of sequences assigned to this node. The color scheme of each pie chart is as the following: dark dull magenta forD. pulexsequences, pale dull blue forD. pulicariasequences, vanilla forD. magnasequences.

            Assignment of sequences to the bacteria, without Firmicutes and Cyanobacteria

            The majority of the assigned sequences fall on two phyla, the Bacteroidetes and the Proteobacteria. Among the Bacteroidetes, most sequences were assigned to the Flavobateriales (between 187 to 463 sequences per sets, or 1.3 - 7.7% of the sequences) and a very large proportion of those to the genusFlavobacterium(Fig.3). Within this genus, no single species stuck out as giving a better match than other species.Flavobacteriaare a group of opportunistic pathogens (e.g. salmon), commensals (e.g. in infusoria, cnidaria) [27] and intracellular symbionts of insects [2830]. They are widely distributed in freshwater habitats, but also occur in association with terrestrial hosts. Some members ofFlavobacteriaare known to play a significant role in the degradation of proteins, polysaccharides, and diatom debris in natural environments [31,32]. Cultured representatives of Flavobacteria with ability to degrade various biopolymers such as cellulose, chitin and pectin were described [33]. The commonness in all datasets here indicates that they may indeed be symbionts ofDaphnia. One may speculate thatFlavobacteriummay play a role in food digestion inDaphnia, which mainly feed on unicellular planktonic algae [34]. This hypothesis has to be tested with a targeted approach.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig3_HTML.jpg
            Figure 3

            Taxonomic diversity of the threeDaphniadatasets within the Bacteroidetes/Chlorobi group. For more explanation see legend to Fig. 2.

            Another genus of the Bacteroidetes, which was consistently found in all datasets isCytophaga(Fig.3) These are gliding bacteria found in freshwater and marine habitats, in soil and in decomposing organic matter. However, hits to this genus were never frequent (between 10 and 25 hits).

            The phylum Proteobacteria attracted 98, 94, 84 and 88% of the sequences assigned to bacteria in theD. pulex, D. pulicaria, D. magnaGS 20 andD. magnaGS FLX datasets, respectively. Table3shows the distribution of all Proteobacteria genera for which at least one dataset attracted more than 1% of the sequences assigned to Bacteria.
            Table 3

            Taxa within the Proteobacteria, which attracted at least 1% of the sequences within at least one of the four datasets.

            Taxon level

            Taxon

            D. pulex

            D. pulicaria

            D. magnaGS 20

            D. magnaGS FLX

            Average

            Class

            Alphaproteobacteria

            3.9

            8.0

            4.5

            6.6

            5.7

            Genus

            Rhodobacter

            0.4

            1.4

            2.0

            2.8

            1.6

            Class

            Betaproteobacteria

            41.9

            72.7

            63.0

            63.5

            60.3

            Family

            Neisseriaceae

            0.1

            1.2

            0.0

            0.3

            0.4

            Genus

            Chromobacterium

            0.1

            1.2

            0.0

            0.2

            0.4

            Order

            Burkholderiales

            41.0

            69.5

            61.8

            61.8

            58.5

            Genus

            Methylibium

            2.8

            3.9

            1.0

            1.2

            2.2

            Family

            Alcaligenaceae

            0.3

            0.6

            1.2

            1.1

            0.8

            Genus

            Bordetella

            0.3

            0.4

            1.2

            1.1

            0.7

            Family

            Burkholderiaceae

            1.3

            3.0

            2.2

            1.9

            2.1

            Genus

            Burkholderia

            0.3

            1.1

            0.3

            0.3

            0.5

            Genus

            Cupriavidus

            0.5

            1.0

            1.4

            1.1

            1.0

            Family

            Comamonadaceae

            32.0

            56.5

            53.0

            53.1

            48.7

            Genus

            Acidovorax

            9.9

            10.5

            16.0

            16.3

            13.2

            Genus

            Rhodoverax

            0.9

            4.1

            3.2

            2.8

            2.8

            Genus

            Polaromonas

            3.9

            12.8

            14.6

            14.7

            11.5

            Genus

            Delftia

            2.5

            5.5

            0.2

            0.2

            2.1

            Genus

            Verminephrobacter

            6.8

            4.8

            4.4

            4.6

            5.2

            Class

            Gammaproteobacteria

            53.0

            16.8

            29.6

            27.2

            31.6

            Genus

            Pseudomonas

            43.3

            11.5

            0.8

            1.5

            14.3

            Genus

            Serratia

            8.6

            0.0

            0.0

            0.0

            2.2

            Genus

            Aeromonas

            0.1

            3.8

            0.0

            0.0

            1.0

            Genus

            Escherichia

            0.1

            0.0

            4.6

            4.4

            2.3

            Cell entries are percentages of the number of sequences assigned to the Proteobacteria.

            The Alphaproteobacteria attracted a lager number of hits (3.9 to 8% of sequences), with the genusRhodobacterbeing the most common in all threeDaphniaspecies (0.4 to 2.8% of reads) (Fig.4). Other genera of the Alphaproteobacteria were only found in theD. pulexor theD. pulicariadatasets (Fig.4). Alphaproteobacteria are commonly found in freshwater environments, including sewage. They are known for a wide range of metabolic capabilities.Rhodobacterwere isolated from sea and freshwater.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig4_HTML.jpg
            Figure 4

            Taxonomic diversity of the threeDaphniadatasets within Alphaproteobacteria. For more explanation see legend to Fig. 2.

            The majority of the sequences assigned to the Proteobacteria (overall about 50% of sequences) where assigned to the Burkholderiales within the Betaproteobacteria (Fig.2, Table3). Within the Burkholderiales, one family, the Comamonadaceae accounted for most of these hits (Fig.5). The Comamonadaceae is a family of gram-negative aerobic bacteria, encompassing the acidovorans rRNA complex. Some species are pathogenic for plants. Within this family four genera (Acidovorax, Rhodoverax, PolaromonasandVerminephrobacter) showed up repeatedly and in high numbers in all datasets (Table3, Fig.5). The generaAcidovoraxandPolaromonaswere particularly common. A further genus,Delftiawas only common in theD. pulexandD. pulicariasequences (Fig.5).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig5_HTML.jpg
            Figure 5

            Taxonomic diversity of the threeDaphniadatasets within Betaproteobacteria. For more explanation see legend to Fig. 2.

            A few other genera within the Betaproteobacteria attracted relatively high numbers of sequences across all or most of the datasets:Chromobacterium, Methylibium, Bordetella, BurkholderiaandCupriavidus(Table3, Fig.5). Of thoseMethylibium petroleiphilumwas highly represented. However, a closer inspection of the sequence alignments indicates that the species in our datasets is not exactly this, but a related species.

            Four genera within the Gammaproteobacteria attracted larger numbers of sequences, but in contrast to the genera in the other classes of the Proteobacteria, here the distribution was not even across the datasets (Table3, Fig.6).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig6_HTML.jpg
            Figure 6

            Taxonomic diversity of the threeDaphniadatasets within Gammaproteobacteria. For more explanation see legend to Fig. 2.

            Hits to species of the genusAeromonaswere found in large number in theD. pulicariadataset, but hardly in the other sets (Table3, Fig.6). Hits were mainly toA. hydrophilaandA. salmonicida, but similarities were below 100%. Both can live under aerobic or anaerobic conditions and are found in water.A. hydrophilais an opportunistic pathogen of humans,A. salmonicidacauses the fish disease, furunculosis.

            The single most often assigned genus in the entire analysis wasPseudomonasin theD. pulexdataset (10,994 assigned reads, 43.3%). These hits were mainly to the speciesP. fluorescens(7,067 reads), and in particularly to the strain PfO-1. Similar, but not as extreme was the presence of the same bacterium in theD. pulicariasequences (Table3, Fig.6). TheP. fluorescensPfO-1 genome project was run in the same genome center (The DOE Joint Genome Institute (JGI,http://​www.​jgi.​doe.​gov/​) where theD. pulexand theD. pulicariasequences were obtained and it seemed possible, that these hits reflect a contamination in theD. pulexscaffolds, rather than a symbiont ofD. pulex. However, inspection of bit scores and sequence identity values in the BLASTN outputs indicated that theDaphniasymbiont is clearly notP. fluorescensPfO-1. TheP. fluorescensgroup includes diverse bacteria that are found in soil, but also in aquatic environments.

            A further contamination candidate is the GammmaproteobacteriumSerratia, to which we found 2,184 matched sequences in theD. pulexgenome. However, it is hardly seen among theD. pulicariasequences, and not seen at all among theD. magnasequences (Table3, Fig.6). The species to which most sequences were assigned isSerratia proteamaculans568, whose genome was sequenced as well by the DOE Joint Genome Institute. Also here, the inspection of the BLASTN results indicated high similarity, but few perfect matches, excluding contamination at the JGI.Serratiaare often associated with the human gut, but are not pathogenic.

            Another genus with many hits to theD. pulexand theD. pulicariasequences, but not to theD. magnasequences (Table3), is the already mentioned BetaproteobacteriumDelftia(Fig.5). The DOE Joint Genome Institute sequencedDelftia acidovoransstrain SPH-1, which is the strain most of the sequences were assigned to. However, inspection of the BLASTN results again showed that theDaphniasymbiont is clearly notD. acidovoransstrain SPH-1.

            About 200 sequences matched Deltaproteobacteria (Fig.2). Within this order various taxa matched sequences from the datasets. However, there was no consistent picture across the threeDaphniaspecies (Fig.7).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig7_HTML.jpg
            Figure 7

            Taxonomic diversity of the threeDaphniadatasets within Delta- and Epsilonproteobacteria. For more explanation see legend to Fig. 2.

            Searching for Cyanobacteria and plastid sequences

            Following the suggestion of Chang and Jenkins [18] thatDaphniamay carry symbiontic plastids or cyanobacteria with them, we looked more closely into these two groups. TheD. magnasequences revealed no hit to any Cyanobacteria taxon. Of theD. pulexsequences 44 (= 0.17% of the assigned sequences) were assigned to the Nostocales, a taxon of the Cyanobacteria. 19 (= 0.074%) of these hits were to the genusNostoc. In theD. pulicariawe found 22 sequences assigned to the Cyanobacteria, half of which were to the Nostocales (Fig.8).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig8_HTML.jpg
            Figure 8

            Taxonomic diversity of the threeDaphniadatasets within Cyanobacteria and Actinobacteria. For more explanation see legend to Fig. 2.

            TheD. pulicariadataset revealed 23 sequences assigned to plastids. One of them was a short sequence (100 bps) to the chloroplasts of the green algaeChlamydomonas, the other to the chloroplasts of flowering plants. Hits to the later came mostly from one scaffold and had high bit scores (> 500) and similarities of more than 90%. TheD. pulexsequences revealed no hits to plastids, but this is not surprising, as the dataset had been sorted out to contain predominately prokaryote sequences. TheD. magnaGS 20 dataset did not reveal any hits to plastids. TheD. magnaGS FLX sequences contained a short sequence (104 bps) matched to a plastid, the chloroplast of the green algaeStigeoclonium helveticum.

            The presence of plastid sequences inDaphniashotgun datasets has however, to be looked at with care, as unicellular green algae are the main food ofDaphnia, both in the field and in the laboratory [34,35]. However, the few sequences assigned to plastids here seem not to correspond closely with the algae, which were used to feed theDaphniain the cultures, before they were used for DNA extraction. TheD. magnaand theD. pulexclone had been kept on an exclusive diet of the green algaeScenedesmussp. and theD. pulicariaclone on a diet of the green algaeAnkistrodesmus falcatus.

            All in all we consider this as rather weak evidence for plastid symbionts in theseDaphniasamples. The original finding was done inD. obtusa[18], which was not included in our study. The authors had observed variation in the type and frequency of plastid occurrence in this species, so it may not be surprising that things are different in other species. Furthermore, the long maintenance of theDaphniaclones in laboratory cultures may have contributed to a loss of plastids. Therefore, the absence of evidence from our metagenomics analysis is certainly not evidence for the absence of possible plastid symbionts inDaphnia.

            Searching for 16S rDNA sequences

            All four datasets were also analyzed with a more conventional approach, which was to identify contigs/scaffolds similar to known 16S rDNA sequences. We compared our data with a collection of 471,792 16S rDNA sequences collected by the Ribosomal Database Project (RDP release 9 update 57) [36]. In total, 27 16S rDNA fragments were identified in theD. pulicariadataset, 13 in theD. pulex, 14 in theD. magnaGS 20, and 11 in theD. magnaGS FLX. Of those, 17, 11, 9, and 10 bacterial species could be inferred in theD. pulicaria,D. pulex,D. magnaGS 20, andD. magnaGS FLX dataset, respectively. Other partial 16S rDNA sequences were identical or almost identical to regions conserved across species, thus could not be used to infer the species. In Table4we listed close to full length 16S rDNA sequences found in the four datasets. The nucleotide sequence identity between these sequences and their corresponding best matches ranged from 91% to 100%. Most best matched 16S rDNAs to our sequences were from uncultured bacteria. Bacterial species that could be inferred using 97% sequence identity as the cutoff value includedPseudomonassp.,E. coli/Shigellaand the already discussed (see above)Flavobacteriumsp. (Table4). In bothD. pulexandD. pulicariadatasets, sequences highly similar to 16S rDNA of unclassified aquatic bacterium R1-B19 were found, an undescribed beta proteobacterium (Table4).
            Table 4

            16S rDNA sequences close to full length identified in the four datasets.

            Dataset

            Sequence ID

            Best matched 16S

            Description of the next three matches4

              

            ID1

            Description

            Bit score3

            Identity (%)

             

            D. magnaGS 20

            contig04123

            S000437499

            Daphniaendosymbiotic bacterium2

            1970

            99

            unculturedPasteuriasp.,P. nishizawae,P. penetrans

             

            contig03555

            S000446092

            aquatic bacterium R1-C1

            1374

            98

            uncultured Cytophagales bacterium, aquatic bacterium R1-C5, uncultured bacterium

            D. magnaGS FLX

            contig00041

            S000893806

            Shigella dysenteriae

            2627

            99

            Escherichia coliW3110,E. coliK12,E. coli

             

            contig06506

            S000343002

            uncultured Cytophagales bacterium

            2468

            96

            uncultured bacterium,Flavobacteriumsp. Nj-26, uncultured Flavobacteriales bacterium

             

            contig06300

            S000372741

            uncultured bacterium

            1947

            93

            Myxococcales str. NOSO-1,Chondromyces pediculatus,Polyangium thaxteri

             

            contig06583

            S000437499

            Daphniaendosymbiotic bacterium2

            1943

            99

            unculturedPasteuriasp.,P. nishizawae,P. penetrans

            D. pulicaria

            ANIT159445.g1

            S000966592

            Flavobacteriumsp. MH45

            1905

            99

            Arctic sea ice bacterium ARK10164, uncultured bacterium,Flavobacterium succinicans

             

            ANIT198306.b1

            S000799101

            uncultured bacterium

            1857

            98

            Comamonadaceae bacterium BP-1b,

             

            ANIT159586.b1

            S000639702

            uncultured bacterium

            1853

            98

            uncultured Burkholderiales bacterium, Comamonadaceae bacterium BP-1b, uncultured proteobacterium

             

            ANIT82605.b1

            S000634984

            uncultured Burkholderiales bacterium

            1846

            99

            uncultured bacterium, Comamonadaceae bacterium BP-1b, Comamonadaceae bacterium BP-1

             

            ANIU5178.g2

            S000429300

            Flavobacteriumsp. GOBB3-209

            1653

            98

            uncultured bacterium, uncultured Cytophagales bacterium, uncultured Sphingobacteriales bacterium

             

            ANIT142825.b1

            S000634984

            uncultured Burkholderiales bacterium

            1570

            98

            uncultured beta proteobacterium, uncultured organism,Rhodoferax ferrireducensT118

             

            ANIS174043.g1

            S000446066

            aquatic bacterium R1-B19

            1485

            99

            uncultured beta proteobacterium, aquatic bacterium R1-B6, uncultured Burkholderiales bacterium

             

            ANIT169338.b1

            S000005772

            Aeromonas eucrenophila

            1465

            99

            Aeromonassp. 'CDC 859-83',A. molluscorum, uncultured bacterium

             

            ANIS242375.b1

            S000658887

            uncultured actinobacterium

            1439

            97

            uncultured bacterium,Modestobacter multiseptatus,Sporichthya polymorpha

             

            ANIS247631.y1

            S000607919

            Pseudomonassp. R-25061

            1419

            99

            Pseudomonassp. R-25209, uncultured bacterium,P. pseudoalcaligenes

             

            ANIU876.b3

            S000948974

            uncultured bacterium

            1386

            98

            uncultured gamma proteobacterium, unculturedPseudomonassp.,Pseudomonassp. G2

             

            ANIT143068.b1

            S000550675

            Pseudomonassp. GD100

            1318

            96

            Pseudomonassp. Pb1(2006),P. poae,P. lurida

             

            ANIT82605.g2

            S000634984

            uncultured Burkholderiales bacterium

            1312

            100

            uncultured bacterium,Variovorax paradoxus, uncultured bacterium SJA-62

             

            ANIT131207.y2

            S000018838

            uncultured Cytophagales bacterium

            1304

            91

            uncultured bacterium, uncultured Bacteroidetes bacterium, rhizosphere soil bacterium RSC-II-81

             

            ANIT102921.y2

            S000895013

            uncultured bacterium

            1170

            93

            uncultured Cytophagales bacterium, uncultured Bacteroidetes bacterium, uncultured bacterium

             

            ANIU1607.g2

            S000799546

            uncultured bacterium

            1092

            96

            Hydrogenophagasp. AH-24,Hydrogenophagasp. CL3,Hydrogenophagasp. YED1-18

            D. pulex

            scaffold_278

            S000541019

            Pseudomonas argentinensis

            2785

            98

            P. argentinensis,P. fluorescensPfO-1,

             

            scaffold_567

            S000402041

            uncultured bacterium

            2680

            97

            uncultured soil bacterium, uncultured Comamonadaceae bacterium, uncultured beta proteobacterium

             

            scaffold_1523

            S000926010

            Serratia proteamaculans568

            2615

            96

            Serratia proteamaculans568, uncultured bacterium, uncultured proteobacterium

             

            scaffold_6081

            S000730527

            Deefgea rivuli

            1792

            97

            uncultured bacterium,Chitinibacter tainanensis, uncultured proteobacterium

             

            scaffold_16248

            S000736150

            gamma proteobacterium GPTSA100-21

            1711

            98

            gamma proteobacterium GPTSA100-22, uncultured bacterium, gamma proteobacterium GPTSA100-26

             

            scaffold_10095

            S000404820

            Pseudomonassp. Hsa.28

            1378

            99

            uncultured bacterium, unculturedPseudomonassp.,P. anguilliseptica

             

            scaffold_1408

            S000446066

            aquatic bacterium R1-B19

            1326

            99

            uncultured beta proteobacterium, aquatic bacterium R1-B6, aquatic bacterium R1-B7

             

            scaffold_21984

            S000656075

            unculturedPseudomonassp.

            1023

            100

            gamma proteobacterium LC-G-2,Pseudomonassp. 7-1,P. fluorescens

            1Given as the ID in RDP.

            2Pasteuria ramosa, the parasite which was present in theD. magnadatasets.

            3The BLAST bit scores obtained from a comparison of the contigs/scaffolds to annotated 16S rDNA sequences present in RDP are shown. A higher number indicates a more significant match.

            4The next top three unique matched species, if they were not the same as the best match

            The 16S rDNA sequences identified only a small subset of the species/genus found in our main analysis based on comparison to NCBI-nt database. One likely explanation of this discrepancy is the low sequencing coverage within the 16S rDNA regions in the shotgun datasets. Another explanation could be that some of the earlier predictions were false positives. However, MEGAN associates a sequence to the lowest common ancestor of the set of taxa defined by all matches above defined thresholds. The amount of false predictions is predicted to be low since the algorithm makes higher amount of unspecific assignments to higher taxonomy levels [20]. Certainly when taxa were inferred regardless if the matched sequence was a suitable phylogenetic marker or not, it could not be excluded that some of the predictions were results of horizontal gene transfer events. However, if this were the case, MEGAN would assign the hit to the least common ancestor of the species, which were involved in horizontal gene transfer, unless neither these species nor related species are in the NCBI database. It was predicted that computing taxonomic content based on sequence comparison to NCBI-nt database will show better resolution at all levels of the taxonomy than an analysis based on a small set of phylogenetic markers or on 16S rDNA sequences alone [20,21]. Our results are consistent with this prediction.

            Despite the under-prediction and the differences between the NCBI-nt and the 16S rDNA databases, quantitatively, the two approaches correlated fairly well at higher taxonomic level (Fig.9).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig9_HTML.jpg
            Figure 9

            Correlation of taxonomic content computed by comparison to NCBI-nt and comparison to 16S rDNA database. The number of sequences assigned to the following taxonomic nodes were plotted: Bacteria, Proteobacteria, Bacteroidetes, Gammaproteobacteria, Deltaproteobacteria, Betaproteobacteria, Flavobacteria, Sphingobacteria, Actinobacteria.

            Searching for identical and similar sequences common in four datasets

            Although sequences in all datasets were assigned to similar bacterial taxa, it is not clear how similar the sequences are across datasets. To identify common sequences, we compared theD. magnaGS 20 sequences with sequences fromD. magnaGS FLX,D. pulex, andD. pulicariausing BLASTN. Identical or nearly identical sequences were identified when a stretch longer than 80% of a query sequence can be aligned with over 98% nucleotide sequence identity to a hit sequence. With this criterion fiveD. magnaGS 20 contigs (corresponding to sixD. pulexscaffolds and 12D. pulicariareads) were identified. Hits identical to these sequences were all found in complete genome sequences ofEscherichia coliW3110 (AP009048.1) andE. coliK12 MG1655 (U00096.2), which suggests that commensalE. colistrains carried by the threeDaphniaspecies are highly similar.

            With a less stringent criterion (a stretch longer than 50% of a query sequence can be aligned with over 90% nucleotide sequence identity to a hit sequence), similar sequences to about 80 GS 20 contig sequences were also identified across the datasets. These sequences mainly fall into taxa within the Proteobacteria, with a few sequences assigned toFlavobacterium.

            The small number of similar sequences shared across the datatsets suggested the bacterial community carried by the threeDaphniaclones from which our datasets originated might be diverse at species and strain level, despite very high homogeneousness observed at higher taxonomy nodes. It should be noted however, that our datasets do not originate directly from field samples, but from three clones, which had been kept in three different laboratories for several generations before the DNA was isolated. This may possibly influence our results in two ways. First, we cannot truly make statements about threeDaphniaspecies, but only about three clones, each coming from a differentDaphniaspecies. Including more clones, might reveal more bacterial symbionts. Second, while culturing these clones in the laboratory, the symbiont community may have changed both qualitatively and quantitatively. New bacterial species may have arrived with food or culture conditions, while other bacteria may have been lost due to the inappropriateness of the laboratory conditions for their culture. For the current analysis, no attempts have been undertaken to vary the culture conditions for any of the three clones and the bacteria associated with the food alga have not been analyzed.

            Repeatability of the metagenomics approach

            ForD. magnawe obtained two shotgun datasets, with sequences produced with two different sequencing platforms, the pyrosequencers GS 20 and GS FLX. Figure10shows the number of sequences assigned to all prokaryote genera (excluding the Firmicutes) in the two datasets. The two datasets gave very congruent results, with a correlation coefficient ofr= 0.98 (P < 0.001, n = 55). The plot shows clearly that stochastic differences occur for genera with very few hits. Expectedly, below 10 sequences assigned to a genus, the datasets lead to quite divergent result.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2164-10-172/MediaObjects/12864_2008_Article_2056_Fig10_HTML.jpg
            Figure 10

            Comparison of the number of assigned sequences (log 10 (x+1)) to prokaryote genera (excluding Firmicutes) of the combined twoD. magna datasets.

            Using contigs instead of reads

            For theD. pulicariadataset, both contigs and singleton raw reads were included in our analysis. For the other three datasets, we used only sequences, which had previously been assembled to contigs or scaffolds. This reduced the number of sequences and thus the number of BLASTN searches considerably. Using large numbers of raw reads would have been beyond our computing power and the abilities of the MEGAN software within a reasonable time period. Using contigs and scaffolds influences the results in various ways. First, it strongly reduces redundancy in the dataset and therefore makes the analysis much quicker. Second, it compromises somewhat the usefulness of the number of assigned sequences as a measure for the abundance for the different taxa. The number of assigned sequences is still a relative measure for the frequency of a given taxa, but the larger the real number of hits would have been, the more strongly the value is reduced. Third, rare members of the symbiont community are likely to remain undetected, because the few reads sequenced for rare species, were unlikely to be assembled in contigs. Thus, our estimates of the number of taxa detected are likely to underestimate the true number of taxa in the community. This conclusion is also supported by the observation that theD. pulicariadataset contained the highest number of taxa identified.

            Conclusion

            Our analysis of shotgun sequences of three clones, each from oneDaphniaspecies revealed a rich bacterial community to be associated with these clones. The particular data structure of our analysis allows for certain conclusions to be drawn. First, the majority of the common bacterial taxa identified are found in allDaphniadatasets. While theD. pulexandD. pulicariaclone cultures from which DNA was isolated originated from laboratories in North America, theD. magnacultures originate from a laboratory in Switzerland. To the best of our knowledge, there was never a cross Atlantic exchange of cultures between laboratories by the time these samples had been taken. Thus, we speculate that the similarity of the symbiont communities in European and North AmericanDaphniasamples, indicates a long lasting stability of these associations.

            Second, the symbiont communities across the threeDaphniaspecies are remarkable similar, yet, they are not identical. At sequence level, the similarity breaks down, indicating that eachDaphniaspecies harbors different species or strains of bacterial symbionts.

            Third, some bacterial taxa were found to be specific to the two datasets produced in the DOE Joint Genome Institute (JGI). Coincidentally, some of the published genomes in these taxa had been originally sequenced by JGI, leading to speculations of whether the JGI may have contaminated theDaphniasamples. Our analysis allows us clearly to reject this hypothesis. Whether the bacterial taxa found to be associated with specificDaphniasamples are contaminations of the laboratory where they were cultured previous to sequencing, or if they are natural symbionts of theDaphnia, cannot not be worked out here.

            Fourth, there is no clear evidence for a stable cyanobacterial or plastid symbiont in theDaphniaspecies. The few scattered hits to some plastid and Cyanobacteria may have been a contamination with the algae food of theDaphnia. Plastid symbionts had been observed inD. obtusa[37]. However, the long laboratory culture of the clones used in the genome study may have influenced the presence of such a photoactive symbiont.

            Methods

            TheD. pulexdataset

            The sequences ofD. pulexare from the DGC whole genome sequencing project. The chosenD. pulexclone called The Chosen One was cultured at Indiana University, Bloomington, USA on a diet of the green algaeScenedesmussp. The animals used to isolate the DNA for the genome project were treated with tetracycline (250 mg/L overnight) before DNA isolation to reduce their bacterial load. Sequencing was done at the DOE Joint Genome Institute (JGI) using the Sanger method. These sequences were obtained from the wFleaBase websitehttp://​wfleabase.​org:​7182/​genome/​Daphnia_​pulex/​current/​genome-assembly-full-jazz_​20060901/​scaffolds/​sequences/​. Scaffolds included in this study were excluded scaffolds, prokaryotic scaffolds, and possible bacterial scaffolds in the currentD. pulexgenome assemblyhttp://​wfleabase.​org:​7182/​genome/​Daphnia_​pulex/​current/​bacteria/​dpulex_​jgi060905_​possible_​bacterial.​txt.

            TheD. pulicariadataset

            Daphnia pulicariais closely related toD. pulexand forms with intermediate characters are frequently encountered, suggesting hybridization of these two species. Indeed, allozyme test for allelic variation at the lactate dehydrogenase loci show both fast and slow electromorphic alleles, indicating that the chosenD. pulicariastrain is a pulicaria/pulex hybrid. This chosenD. pulicariaclone was cultured at the Hubbard Center for Genome Studies at the University of New Hampshire, USA, on a diet of the green algaeAnkistrodesmus falcatus. Previous to it's culturing at the University of New Hampshire it was maintained in a laboratory at Utah State University. The animals used to isolate the DNA for the genome project were treated with tetracycline (250 mg/L overnight) before DNA isolation to reduce their bacterial load. Sequencing ofD. pulicariawas also done at the DOE Joint Genome Institute (JGI) using the Sanger method. A low coverage genome assembly of aD. pulicariaclone is available to DGC members, and others may request access to this data. As the DGC and JGI data agreements allow, this will be released for public access on the wfleabase database:http://​wfleabase.​org/​genome/​Daphnia_​pulicaria/​. For more information on theD. pulexandD. pulicariagenome data seehttp://​wfleabase.​org/​.

            TheD. magnadatasets

            The sequences ofD. magnaoriginated from a shotgun sequencing project which aimed at sequencing the endoparasitic bacteriumP. ramosa. During the analysis of the data large number of sequences clearly unrelated to the Firmicutes (the group to whichP. ramosabelongs) showed up. Only these sequences are included in this paper. As these data are not yet published elsewhere, we describe here the DNA isolation, library construction and sequencing in detail.

            Daphnia magnacultures were raised at the University of Fribourg, Switzerland on a diet of the green algaeScenedesmussp. TheDaphniahad been exposed to the gram-positive bacteriumPasteuria ramosa, an endo-parasite ofDaphnia[17] when they were 3–5 days old. Most animals became infected and were shipped for further processing to the University of Florida, USA. One thousandP. ramosainfectedD. magnawere suspended in 5 ml of Buffer A (1.0 M NaCl, 50 mM Tris-HCl pH 8.0) and homogenized gently in a glass pestle and mortar. The homogenate was passed through a 50–100 micron metal mesh and 21 micron nylon mesh to removeDaphniadebris. About 5,000,000P. ramosacells were obtained and resuspended in 450 μl of Buffer A. These were added to an equal volume (450 μl) of 2% agarose for preparing a gel plug to embed the vegetative cells, and 10 gel plugs were produced. To disrupt cells gently, the gel plugs were transferred into Buffer B (0.2% sodium deoxycholate, 0.5% Brij 58, 0.5% sarcosine, 50 mM Tris-HCl pH 8.0, 100 mM EDTA pH 8.0, 0.40 M NaCl) and incubated at 37°C overnight. These were then transferred into 10 ml of Buffer C (100 mM NaCl, 50 mM Tris-HCl pH 8.0, 100 mM EDTA pH 8.0, 0.5% sarcosine, 0.2 mg/ml protease K) at room temperature. The gel plugs were transferred to 40 ml of Wash Buffer (10 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0) and washed three times in a shaker at low speed for 1 hourrespectively to remove detergents. Gel plugs were transferred to 40 ml of PMSF Buffer (1.0 mM phenylmethylsulfonyl floride PMSF, 10 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0) and incubated at room temperature for 1 hourwith gentle shaking; this process was repeated with fresh PMSF buffer. The plugs were then washed twice in 40 ml of Wash Buffer following incubation at 50°C for 20 minutes. The gel plugs were then transferred to 40 ml of 50 mM EDTA (pH 8.0) and stored at 4°C overnight. The DNA in the gel plugs was digested with 10 U of HindIII per plug at 37°C for 30 minutes.

            The gel plugs with the partially digested DNA were cut into slurry. They were loaded onto a 1% agraose gel (Sigma, Type VII, low gelling temperature), and sealed on the top with agarose. Electrophoretic development occurred in 0.7 × TAE Buffer using a FIGE apparatus under Program 4 (BioRad, Hercules, CA 94547). Products ranging in size from 18 to 33 Kb were extracted from the gel (estimated 60 ng DNA total) following the protocol of GELase Agarose Gel-Digesting Preparation kit (Epicentre, Madison, WI 53713), and used to prepare the cosmid library.

            The preparation of the cosmid library followed the procedures described by Bell et al. [38], with additional information described by Chow et al. [39]. In brief to construct the cosmid library an estimated 60 ng of 18–33 Kb fragments recovered from gel were cloned into vector pCC1 which was digested with HindIII and then dephosphorylated with shrimp alkaline phosphatase followed the protocol (Roche, Indianapolis, IN 46250). The ligation products were packaged into bacteriophage particles using MaxPlax Lamda DNA packaging extracts (Epicentre, Madison, WI 53713) according to the protocol of the kit. Bacteriophage containing an estimated 5 × 103particles in 50 μL were applied to infect 200 μl of EPI300 cells grown to exponential phase in LB liquid medium (Luria-Bertani medium) containing 10 mM MgSO4 and 0.2% maltose, which had been inoculated from the overnight culture grown in LB containing 10 mM MgSO4. After absorption following incubating at 37°C for 20 minutes, 1 ml of fresh LB medium was added and incubated for an additional 45 minutes. The infected cells were spread on LB 1% agar plates containing 12.5 μg/ml of chloramphenicol, 1 mM of IPTG and 40 μg/ml of X-gal for selection.

            The cosmid library was used in two runs of 454 pyrosequencing [26]. The first run was carried out on a GS 20 454 pyrosequencer, which gave read length around 90 basepairs (bps). The second run was done on a GS FLX 454 pyrosequencer, which gave reads length around 250 bps. Both pyrosequencing projects were done in the Interdisciplinary Center for Biotechnology Research at the University of Florida, Gainesville, USA. The reads obtained from the GS 20 and the GS FLX shotgun sequencing were separately assembled into contigs. These contigs were used in the analyses presented here.

            Scanning electron microscopy

            For scanning electron microscopic (SEM)D. magnawas fixed in 3% glutaraldehyde in 0.1 M PB for 2 hours at 20°C. Sample was washed two times in distilled water for 5 to 10 seconds, dehydrated in graded ethanol series, and critical point dried (CPD) overnight (16 hours). The specimens were coated with gold (20 nm) and viewed using a Philips XL 30 ESEM under high volume conditions from 5 to 15 kv.

            Data analysis

            Sequences from theD. pulex,D. pulicariaand the twoD. magnadatasets included in this study are described in Table2. Sequences were compared against the NCBI-nt database on nucleotide sequences using BLASTN [19] with the default settings in December 2007. Sequences longer than 1000 bps were divided into overlapping fragments around 500 bps. Sequences were homogenized to fragments of similar length so BLAST scores were comparable across different searches. Sequence comparison is computational challenging and was performed with an Opteron Linux high performance computer cluster established and maintained by the [BC]2Basel Computational Biology Center at the Biozentrum University of Baselhttp://​www.​bc2.​ch/​center/​index.​htm. For the graphical presentation of the results we combined the twoD. magnadata sets.

            For the analysis of the BLASTN results we used the metagenomics software MEGAN [20]. This software allows exploring the taxonomic content of a sample based on the NCBI taxonomy. The blast files were imported into MEGAN using the import option BLASTN. The program then uses several thresholds to generate sequence-taxon matches. The "min-score" filter sets a bit-score cutoff value. The "top-percent" filter is used to retain hits whose scores lie within a given percentage of the highest bit score. The "min-support" filter is used to set a threshold for the minimum number of sequences that must be assigned to a taxon. We used all default parameter settings of the software (top-percent = 10, min-support = 2), except the minimal threshold for the bit score of hits, which were set at 100, following the recommendation of the authors [20]. This reduces the number of reads assigned to a taxon, but avoids assignment based on weak homology. This analysis was done for all datasets between the 8. and the 11. January 2008.

            While inspecting the data we ignored reads assigned to taxa other than plants and bacteria. Within the bacteria, we ignored the taxon Firmicutes (mostly gram-positive bacteria, many of which are endospore formers), because the two datasets ofD. magnacame from animals infected with the endospore forming pathogen,P. ramosa. The two other datasets (D. pulexandD. pulicaria), had only few sequences assigned to the Firmicutes (less than 0.2%). Thus, excluding the Firmicutes from the analysis did not influence the overall analysis.

            In a separate analysis we manually inspected all four datasets for hits assigned to plant taxa (every taxon within and including the Viridiplantae), searching for hits to plastids (chloroplasts). For this analysis we set the MEGAN parameter minimum supported taxa to one.

            Declarations

            Acknowledgements

            Support for the preparation and characterization of cosmid DNA libraries forD. magnawas provided by USDA/CSREES Project 50554, USDA/CSREES Multi-State Project NE1019, and the University of Florida IFAS Agricultural Experiment Station (CRIS Projects FLA-MCS-04353 and FLA-MCS-04080). The sequencing and portions of the analyses of theD. magnadata were done at the Interdisciplinary Center for Biotechnology Research at the University of Florida, Gainesville, USA. We thank Li Liu for support and for the assembly of the contigs of the twoD. magnadatasets. The sequencing and portions of the analyses of theD. pulexand theD. pulicariadata were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC)http://​daphnia.​cgb.​indiana.​edu. Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by The Center for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. We thank [BC]2Basel Computational Biology Center at the Biozentrum University of Basel for hardware and software support. Our work benefits from, and contributes to theDaphniaGenomics Consortium. We are grateful to Daniel Mathys from the Zentrum für Mikroskopie Universität Basel for technical support with the SEM.

            Authors’ Affiliations

            (1)
            Swiss Tropical Institute
            (2)
            Department of Microbiology and Cell Sciences, University of Florida
            (3)
            Zoological Institute, Basel University
            (4)
            Functional Genomics Center Zurich, UNI/ETH Zurich

            References

            1. Delwart EL:Viral metagenomics. Reviews in Medical Virology2007,17(2):115–131.View ArticlePubMed
            2. Beardsley TM:Metagenomics reveals microbial diversity. Bioscience2006,56(3):192–196.View Article
            3. Allen EE, Banfield JF:Community genomics in microbial ecology and evolution. Nature Reviews Microbiology2005,3(6):489–498.View ArticlePubMed
            4. Streit WR, Schmitz RA:Metagenomics - the key to the uncultured microbes. Curr Opin Mircobiol2004,7(5):492–498.View Article
            5. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu DY, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y, Smith HO:Environmental genome shotgun sequencing of the Sargasso Sea. Science2004,304(5667):66–74.View ArticlePubMed
            6. Bidle KD, Lee S, Marchant DR, Falkowski PG:Fossil genes and microbes in the oldest ice on Earth. Proc Natl Acad Sci USA2007,104(33):13455–13460.View ArticlePubMed
            7. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, Saar MO, Alexander S, Alexander EC, Rohwer F:Using pyrosequencing to shed light on deep mine microbial ecology. Bmc Genomics2006,7:57.View ArticlePubMed
            8. Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran NA, Quan PL, Briese T, Hornig M, Geiser DM, Martinson V, vanEngelsdorp D, Kalkstein AL, Drysdale A, Hui J, Zhai J, Cui L, Hutchison SK, Simons JF, Egholm M, Pettis JS, Lipkin WI:A metagenomic survey of microbes in honey bee colony collapse disorder. Science2007,318(5848):283–287.View ArticlePubMed
            9. Turnbaugh PJ, Baeckhed F, Fulton L, Gordon JI:Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host & Microbe2008,3(4):213–223.View Article
            10. Booijink C, Zoetendal EG, Kleerebezem M, de Vos WM:Microbial communities in the human small intestine: coupling diversity to metagenomics. Future Microbiology2007,2(3):285–295.View ArticlePubMed
            11. Schmitt S, Wehrl M, Bayer K, Siegl A, Hentschel U:Marine sponges as models for commensal microbe-host interactions. Symbiosis2007,44(1–3):43–50.
            12. Woyke T, Teeling H, Ivanova NN, Huntemann M, Richter M, Gloeckner FO, Boffelli D, Anderson IJ, Barry KW, Shapiro HJ, Szeto E, Kyrpides NC, Mussmann M, Amann R, Bergin C, Ruehland C, Rubin EM, Dubilier N:Symbiosis insights through metagenomic analysis of a microbial consortium. Nature2006,443(7114):950–955.View ArticlePubMed
            13. Leveau JHJ:The magic and menace of metagenomics: prospects for the study of plant growth-promoting rhizobacteria. European Journal of Plant Pathology2007,119(3):279–300.View Article
            14. Poinar HN, Schwarz C, Qi J, Shapiro B, MacPhee RDE, Buigues B, Tikhonov A, Huson DH, Tomsho LP, Auch A, Rampp M, Miller W, Schuster SC:Metagenomics to paleogenomics: Large-scale sequencing of mammoth DNA. Science2006,311(5759):392–394.View ArticlePubMed
            15. Peters RH, Bernardi DR, eds:Daphnia.Verbania Pallanza: Consiglio Nazionale delle Ricerche Istituto Italiano di Idrobiologia 1987.
            16. Green J:Parasites and epibionts of Cladocera. Trans Zool Soc Lond1974,32:417–515.View Article
            17. Ebert D:Ecology, epidemiology and evolution of parasitism in Daphnia. [http://​www.​ncbi.​nlm.​nih.​gov/​books/​bookres.​fcgi/​daph/​screenA4.​pdf] Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information 2005.
            18. Chang N, Jenkins DG:Plastid endosymbionts in the freshwater crustacean Daphnia obtusa. J Crustac Biol2000,20(2):231–238.View Article
            19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ:Basic local alignment search tool. J Mol Biol1990,215:403–410.PubMed
            20. Huson DH, Auch AF, Qi J, Schuster SC:MEGAN analysis of metagenomic data. Genome Research2007,17(3):377–386.View ArticlePubMed
            21. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J:Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Research2008,36(7):2230–2239.View ArticlePubMed
            22. Pop M, Salzberg SL:Bioinformatics challenges of new sequencing technology. Trends Genet2008,24(3):142–149.PubMed
            23. Raes J, Foerstner KU, Bork P:Get the most out of your metagenome: computational analysis of environmental sequence data. Curr Opin Mircobiol2007,10(5):490–498.View Article
            24. McHardy A, Rigoutsos I:What's in the mix: phylogenetic classification of metagenome sequence samples. Curr Opin Mircobiol2007,10(5):499–503.View Article
            25. Schloss PD, Handelsman J:A statistical toolbox for metagenomics: assessing functional diversity in microbial communities. Bmc Bioinformatics2008.,9:
            26. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM:Genome sequencing in microfabricated high-density picolitre reactors. Nature2005,437(7057):376–380.PubMed
            27. Fraune S, Bosch TCG:Long-term maintenance of species-specific bacterial microbiota in the basal metazoan Hydra. Proc Natl Acad Sci USA2007,104:13146–13151.View ArticlePubMed
            28. Bandi C, Damiani G, Magrassi L, Grigolo A, Fani R, Sacchi L:Flavobacteria as intracellular symbionts in cockroaches. Proc R Soc Lond B1994,257:43–48.View Article
            29. Hurst GDD, Hammarton TC, Bandi C, Majerus TMO, Bertrand D, Majerus MEN:The diversity of inherited parasites of insects: the male-killing agent of the ladybird beetle Coleomegilla maculata is a member of the Flavobacteria. Genet Res Camb1997,70:1–6.View Article
            30. Hurst GDD, Bandi C, Sacchi L, Cochrane AG, Bertrand D, Bernardet JF, Nakagawa Y, Holmes B, Karaca I, Majerus MEN:Adonia variegata (Coleoptera: Coccinellidae) bears maternally inherited Flavobacteria that kill males only. Parasitology1999,118:125–134.View ArticlePubMed
            31. Pinhassi J, Azam F, Hemphala J, Long R, Martinez J, Zweifel U, Hagström A:Coupling between bacterioplankton species composition, population dynamics, and organic matter degradation. Aquat Microb Ecol1999,17:13–26.View Article
            32. Cottrell M, Kirchman D:Natural assemblages of marine proteobacteria and members of the Cytophaga-Flavobacter cluster consuming low– and high-molecular-weight dissolved organic matter. Appl Environ Microbiol2000,66:1692–1697.View ArticlePubMed
            33. Bernardet J, Segers P, Vancanneyt M, Berthe F, Kersters K, Vandamme P:Cutting a gordian knot: emended classification and description of the genus Flavobacterium, emended description of the family Flavobacteriaceae, and proposal of Flavobacterium hydatis nom. nov. (Basonym, Cytophaga aquatilis Strohl and Tait 1978). Int J Bacteriol1996,46:128–148.View Article
            34. Lampert W:Feeding and Nutrition in Daphnia. Mem Ist Ital Idrobiol1987,45:143–192.
            35. Wetzel RG:Limnology.Philadelphia, USA: Saunders College Publishing 1975.
            36. Cole J, Chai B, Farris R, Wang Q, Kulam-Syed-Mohideen A, McGarrell D, Bandela A, Cardenas E, Garrity G, Tiedje J:The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res2007,35:D169-D172.View ArticlePubMed
            37. Chang HH, Shyu HF, Wang YM, Sun DS, Shyu RH, Tang SS, Huang YS:Facilitation of cell adhesion by immobilized dengue viral nonstructural protein 1 (NS1): Arginine-glycine-aspartic acid structural mimicry within the dengue viral NS1 antigen. J Infect Dis2002,186(6):743–751.View ArticlePubMed
            38. Bell KS, Avrova AO, Holeva MC, Cardle L, Morris W, DeJong W, Toth IK, Waugh R, Bryan GJ, Birch PRJ:Sample sequencing of a selected region of the genome of Erwinia carotovora subsp. atroseptica reveals candidate phytopathogenicity genes and allows comparison with Escherichia coli. Microbiology2002,148:1367–1378.PubMed
            39. Chow V, Nong G, Preston JF:Structure, Function, and Regulation of the Aldouronate Utilization Gene Cluster from Paenibacillus sp. Strain JDR–2. J Bacteriol2007,189:8863–8870.View ArticlePubMed

            Copyright

            © Qi et al. 2009

            This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.