DNA methylation patterns provide insight into epigenetic regulation in the Pacific oyster (Crassostrea gigas)

Background DNA methylation is an epigenetic mechanism with important regulatory functions in animals. While the mechanism itself is evolutionarily ancient, the distribution and function of DNA methylation is diverse both within and among phylogenetic groups. Although DNA methylation has been well studied in mammals, there are limited data on invertebrates, particularly molluscs. Here we characterize the distribution and investigate potential functions of DNA methylation in the Pacific oyster (Crassostrea gigas). Results Methylation sensitive PCR and bisulfite sequencing PCR approaches were used to identify CpG methylation in C. gigas genes and demonstrated that this species possesses intragenic methylation. In silico analysis of CpGo/e ratios in publicly available sequence data suggests that DNA methylation is a common feature of the C. gigas genome, and that specific functional categories of genes have significantly different levels of methylation. Conclusions The Pacific oyster genome displays intragenic DNA methylation and contains genes necessary for DNA methylation in animals. Results of this investigation suggest that DNA methylation has regulatory functions in Crassostrea gigas, particularly in gene families that have inducible expression, including those involved in stress and environmental responses.


Background
Epigenetic mechanisms induce changes in gene activity without alteration to the underlying DNA sequence [1]. Common epigenetic mechanisms include DNA methylation, histone modifications and non-coding RNA activity. The most well-studied of these is DNA methylation, which refers to the addition of a methyl group to position 5 of cytosines. In animals, this reaction is catalyzed by a family of enzymes called DNA (cytosine-5) methyltransferases (DNMTs) and occurs almost exclusively in CpG dinucleotides. DNA methylation is typically associated with transcriptional repression, and is primarily achieved by methylation in gene promoters [2][3][4]. The functional significance of DNA methylation in vertebrates includes providing genomic stability [5], regulation of imprinted genes [6] and X-chromosome inactivation [7]. In mammals, DNA methylation is essential for development and cell differentiation [8] and defects or unintended changes in DNA methylation can have deleterious consequences such as embryonic lethality [9] and tumorgenesis [10]. DNA methylation, like many epigenetic marks, may be heritable, therefore unintended changes as a result of environmental exposures or other processes can be passed on for multiple generations [11].
The extent of cytosine methylation varies considerably among eukaryotes. In vertebrates, approximately 70-80% of cytosines in CpG dinucleotides are methylated [12], a pattern referred to as global methylation. Invertebrates display a wide range of DNA methylation, from very limited methylation in Drosophilia melanogaster [13] and Caenorhabditis elegans [14] to a mosaic pattern of methylation in the sea urchin (Strongylocentrotus purpuratus) [15] and Ciona intestinalis [16,17]. Bird and Taggart [12] concluded that there were three general types of methylation patterns: the 'insect-type' which shows little to no methylation, the 'echinoderm-type', the genomes of which contain both methylated and non-methylated fractions, and the heavily methylated 'vertebrate-type'. Recent studies in the honey bee (Apis mellifera) indicate these patterns may be more complex [18,19]. In contrast to D. melanogaster, which lacks most of the classical DNMTs [20] and shows limited cytosine methylation [21], A. mellifera has a fully functional set of DNA methylation enzymes and shows substantial methylation across its genome [18].
In vertebrates, regulation of transcription by DNA methylation is accomplished by differential patterns of methylation in intergenic regions, namely gene promoters [2][3][4]. In contrast, there are no significant differences reported in the methylation status of gene promoters in invertebrates, where methylation appears to be targeted specifically to transcription units [17,22]. Computational analyses of the methylation status of A. mellifera genes have provided some of the first evidence supporting a regulatory role of intragenic DNA methylation in invertebrates [19,23]. In these studies, genes associated with general metabolic or 'housekeeping' functions were predicted to be hyper-methylated, whereas caste-specific genes were preferentially hypomethylated. This functional clustering suggests DNA methylation serves to regulate gene transcription in A. mellifera, however, it is uncertain if this function is conserved across invertebrate taxa. Furthermore, it is unclear exactly how intragenic cytosine methylation directly affects transcription.
Studies in A. mellifera and others illustrate the diversity of DNA methylation patterns in invertebrate taxa and highlight gaps in our understanding of the evolutionary and functional significance of DNA methylation. One taxonomic group that has been notably absent from these investigations is the phylum Mollusca. Molluscs were first categorized as having 'echinoderm-type' DNA methylation patterns based on experimental evidence using the common mussel (Mytilus edulis) [12]. Since then, there has been little investigation of DNA methylation in molluscs with the exception of evidence suggesting the presence CpG methylation in the clam, Donux truculus [24]. In addition to increasing our understanding of the evolution of DNA methylation in invertebrate taxa, this study provides an opportunity to evaluate the Pacific oyster (Crassostrea gigas) as a model organism for analyzing DNA methylation in an aquatic species. Bivalve molluscs are important bioindicators [25] and elucidating the functional significance of DNA methylation in these organisms may prove valuable for understanding the effects of environmental stress in aquatic organisms. Here, we report the first investigation into DNA methylation profiles in the genome of the Pacific oyster. We confirm the presence of intragenic CpG methylation in C. gigas. We also demonstrate a relationship between predicted methylation status and gene function, suggesting that DNA methylation performs important regulatory functions in C. gigas.
Implications of these findings are discussed in both an evolutionary and ecological context.

Methylation Sensitive PCR
A Methylation Sensitive PCR (MSP) approach was used to identify specific methylated sites. Five genes associated with immune function were analyzed and methylation status determined (Table 1). Methylation status can be concluded based on the presence or absence of a PCR product in the methylation sensitive HpaII digest. Of the five genes analyzed, CpG methylation was confirmed for heat shock protein 70 (hsp70), whereas no methylation was detectable at restriction site(s) for the other sequences examined. The CpG observed to expected ratios (see Methods for calculation) are included in Table 1 for each gene. It should be noted that hsp70 has the lowest ratio of all the genes analyzed (0.57). This low ratio is predictive of a hyper-methylated status, which is confirmed here by MSP.

Bisulfite Sequencing PCR
In order to describe methylated cytosines outside of CCGG sites, Bisulfite Sequencing PCR (BSP) was used. Five genes predicted to be hyper-methylated, and five predicted to be hypo-methylated (based on CpG observed to expected ratio) were randomly selected for analysis. Valid PCR products were produced for two of the genes. This is a typical result as the conversion of unmethylated cytosines results in challenges for primer specificity. Four individual clones were sequenced for each of the two products. There was a 100% conversion rate for non-CpG cytosines for each of the clones sequenced. In the first fragment, a 136 bp fragment with homology to the amino terminal fragment of the human neuromedin-u receptor [Swiss-Prot: Q9GZQ4"], one of seven CpGs sites displayed methylation in 25% of the clones sequenced (Figure 1(a)). In a second fragment, one of two CpGs sites was determined to be methylated in 50% of the clones sequenced in a 93 bp region (Figure 1(b)). The latter sequence has significant homology to human bromodomain adjacent to zinc finger domain, 1A [Swiss-Prot: Q9NRL2].

In Silico Analysis of C. gigas Transcriptome
The ratio of observed to expected CpG dinucleotides (CpGo/e) was used to predict methylation status in the C. gigas transcriptome. This approach is based on the known hyper-mutability of methylated cytosines, which readily deaminate to thymine residues [26]. This CpG mutation is not easily corrected by DNA repair machinery, and as a result consistently methylated regions of DNA are depleted of CpG dinucleotides over evolutionary time [27]. Consequently, regions of DNA with a low CpGo/e are predicted to be methylated, whereas regions with a high CpGo/e (approaching 1.0) are predicted to be unmethylated. This approach has been used to reliably predict methylation status across many taxonomic groups [17,19,22,28].
A non-redundant C. gigas contig database, 'GigasDatabase' version 6 [29] was utilized for this analysis. To ensure only CpG (and not GpC) dinucleotides were being evaluated, analysis was limited to annotated sequences. The probability density function of the CpGo/e for 12,210 annotated C. gigas expressed sequence tag (EST) contigs is illustrated in Figure 2. We find that the data fit a bimodal mixture model (blue curve) significantly better than a unimodal distribution. The red curves represent the scaled, normal mixture components, which have means of 0.40 (± 0.12 SD) and 0.70 (± 0.21 SD) respectively ( Figure 2). A majority of the contigs have a CpGo/e less than 1.0.
The ratio of observed to expected GpC dinucleotide frequencies (GpCo/e) was calculated in order to be assured that the bimodal distribution of CpGo/e was not biased toward G+C content of specific genes as there are no known mechanisms for preferential depletion of the GpC dinucleotide. As predicted, the ratio of observed to expected GpC's approaches 1.0 following a unimodal Gaussian distribution (Figure 2 inset). In addition, there is a significant negative correlation between CpGo/e and TpGo/e (p = 0.00) indicating that the depletion of CpG dinucleotides is associated with the conversion of methylated CpG sites to TpG dinucleotides.
In order to determine any functional difference that may exist among those genes with lower than expected CpGo/e ratios, data were analyzed in the context of each gene's biological process GO Slim term ( Figure 3). Several biological processes have CpGo/e ratios that are significantly different from each other (see Additional file 1: Matrix of p-values for comparisons between GO Slim categories based on CpGo/e). Specifically, genes with lower CpGo/e ratios (predicted to be hyper-methylated) were associated with DNA metabolism, RNA metabolism, and cell cycle and proliferation. Biological processes with higher CpGo/e ratios (predicted to be hypo-methylated) include cell adhesion, cell-to-cell signalling and signal transduction. This analysis confirms that the normal mixture components described previously in Figure 2 are enriched with genes from particular functional categories.

Discussion
Results of methylation specific PCR and bisulfite sequencing PCR indicate that the Pacific oyster (Crassostrea gigas) genome is methylated. Further evidence supporting the presence and importance of methylation in C. gigas is the identification of genes that encode DNA methyltransferases (DNMT), the family of proteins responsible for the Results of methylation status of five genes associated with immune response by MSP. PCR was carried out on undigested, HpaII digested, and MspI digested DNA. Presence (+) or absence (-) of PCR product is indicated. Number of CCGG restriction sites in the indicated sequence and CpGo/e ratios are also provided. enzymatic conversion of cytosine to 5-methylcytosine. Animals that lack DNA methylation such as C. elegans also lack essential DNMTs, while invertebrates with an intermediate level of DNA methylation such as honey bees, sea urchins and urochordates have the full set of DNMT genes [30]. Sequences with high homology to DNMT3 (responsible for de novo methylation), DNMT1 (associated with maintenance methylation), and methyl-CpG-binding domain protein 2 (mediation of the effects of DNA methylation) are present in a publicly available C. gigas contig database, GigasDatabase version 6 [29]. These annotated sequences can be found in GigasDatabase with accession numbers CU684371.p.cg.6 (e-value 1e-61), CU994437.p.cg.6 (e-value 2e-26), and AM861084.p.cg.6 (e-value 1e-11), respectively. While a DNMT2 homolog has not been identified, it may not be required for DNA methylation in C. gigas as it functions primarily as a tRNA methyltransferase and shows only weak DNA methyltransferase activity in vitro [31]. DNMTs are an evolutionarily conserved group of proteins, but show structural diversity both within and among taxa [32]. The evolutionary diversity of DNA methylation within and among phylogenetic groups provides justification for further evaluation of the functions of this epigenetic mark. The presence of intragenic methylation in C. gigas is similar to that of other invertebrates that primarily exhibit intragenic DNA methylation patterns [33,17], the roles of which have been largely unexplored. Studies of DNA methylation in mammals have generally focused on promoter regions, where hyper-methylation of promoters inhibits initiation of transcription [2]. In contrast, invertebrate genomes do not show differentially Figure 2 Distribution of predicted methylation status of 12,210 annotated C. gigas transcripts measured computationally by CpGo/e ratio. Probability density function of CpGo/e for 12,210 C. gigas contigs. Blue curve is fitted mixture model; red curves are scaled, normal mixture components with means of 0.40 and 0.70 respectively. For contrast, a control dinucleotide (GpCo/e) is also shown with the blue curve representing a normal, unimodal distribution (inset). methylated gene promoters [22]. One of the long-standing hypotheses is that intragenic DNA methylation prevents inappropriate initiation of transcription outside of promoter regions [34]; however new studies have begun to investigate a more active role for intragenic DNA methylation, namely in regulation of expression. For example, exonic DNA methylation has been shown to regulate transcription of the phytochrome A gene in Arabidopsis thaliana [35]. In humans, investigation of intragenic CpG islands (≥ 200 bp regions with G+C content of at least 50% and CpGo/e close to expected) has revealed that CpG islands in terminal exons may regulate transcription of non-coding RNAs [36]. Here, using BSP, we observed methylation variability in two CpG sites that may indicate cell-specific methylation. The function of intragenic DNA methylation in C. gigas cannot be conclusively determined from this study, but results of studies in other organisms suggest that it could be involved in either repression of transcription outside of transcription start sites and/or regulation of expression.
Within the transcriptome of the Pacific oyster, a significant difference in methylation pattern was observed across gene families. A majority of C. gigas genes analyzed were depleted in CpG dinucleotides (i.e. CpGo/e < 1.0) and show a significantly bimodal distribution, suggesting that DNA methylation is a common feature of the C. gigas transcriptome, and that certain groups of genes have significantly different levels of methylation. The bimodal distribution of CpGo/e is similar to the pattern observed in the honey bee A. mellifera, where authors reported a hyper-methylated fraction that was enriched in genes involved with general metabolic functions and a hypo-methylated fraction enriched with genes that are associated with caste-specific functions [19]. Similarly when C. gigas transcripts were clustered according to their functional annotations using GO Slim terms, we see that the two distributions are comprised of functionally distinct classes of genes with varying regulatory requirements. Specifically, genes predicted to be hyper-methylated are associated with housekeeping functions and those predicted to be hypo-methylated are associated with general immune functions. Hypermethylation of intragenic regions of housekeeping genes is consistent between C. gigas and A. mellifera [19], but stands in contrast to observations in vertebrates, where distinct hypo-methylation of housekeeping gene promoters is associated with global expression [37]. Constitutive DNA methylation in housekeeping genes in C. gigas could be important for repressing transcription outside of promoter regions as previously discussed. It has also been proposed that hyper-methylation of housekeeping genes in A. mellifera indicates epigenetic control of gene activity in housekeeping genes [23]. Further experiments will be required to determine whether hyper-methylation of housekeeping genes plays a passive role in preventing inappropriate transcription or a more active role in maintaining expression in C. gigas. Highest CpGo/e ratios were observed in genes involved in the oyster's innate immune response, including categories of cell adhesion, cell-cell signaling, and signal transduction. Our experimental data using MSP supports the predicted hypo-methylation of this class of genes as only 1 of the 5 immune related genes were methylated. Our results do not indicate that DNA methylation is entirely absent from genes in the hypomethylated group as CpG depletion is still observed (CpGo/e 0.7) which stands in contrast to the hypomethylated genes in A. mellifera (CpGo/e >1.0). One explanation as to why it would be advantageous for this class of genes to be hypo-methylated is that it allows for greater epigenetic flexibility and higher regulatory control. Oysters have been shown to have high phenotypic plasticity in response to environmental changes and stress [38,39] and it could be postulated that an epigenetic mark, such as DNA methylation, could provide this level of control. DNA methylation has been generally considered to be a less dynamic epigenetic mark, however, it has been reported in plants that DNA methylation levels are involved in regulating gene expression in response to stress and show active methylation and demethylation in response to various stressors [40][41][42]. It has been hypothesized from these studies that DNA methylation is a possible mechanism to impart protection against local stresses in future generations [43]. The identification of genes involved in demethylation in C. gigas would be an important step toward uncovering the nature of these epigenetic marks. DNA methylation patterns have been shown to be heritable in mammalian taxa [44], and changes in DNA methylation patterns can persist for multiple generations [45]. Little work has been done to investigate heritability of DNA methylation in invertebrates, although a recent study of the crustacean, Daphnia magna, has shown transgenerational heritability of DNA methylation patterns after exposures to 5-azacytidine [46]. If DNA methylation does play a role in regulation of transcription in C. gigas it may provide a mechanism not only for regulating responses to stress, but also for adapting to local stressors through heritability of DNA methylation patterns. Investigating the potential of epigenetic control in mechanisms of local adaptation may prove useful in understanding impacts of anthropogenic inputs in aquatic ecosystems and populations. Likewise, it is possible that epigenetic mechanisms may provide an explanation for other phenomena associated with heritability such as inbreeding depression and hybrid vigour.
Elucidating functional significance of DNA methylation in aquatic invertebrates may change the way we study impacts of environmental change in aquatic organisms. A range of factors such as diet [47,48], xenobiotic chemicals [49], and endocrine disruptors [11] have been shown to disrupt DNA methylation patterns. These epigenetic disruptions are increasingly associated with disease susceptibility, which in some cases can be passed on for multiple generations [50]. Although these investigations have been performed almost exclusively in mammalian species, recent studies have reported a dose dependent relationship between concentration of mercury and cadmium and total DNA methylation in D. magna [46,51]. Understanding which environmental factors can affect DNA methylation and elucidating the functional significance of DNA methylation in these important bioindicator species will be major steps toward clarifying the complex interactions between the environment, gene expression, and organismal responses.

Conclusions
The Pacific oyster genome displays methylation. In silico analysis reveals intragenic regions are targeted for methylation consistent with reports of methylation in other invertebrate species. Results of this investigation suggest that DNA methylation has regulatory functions in Crassostrea gigas, particularly in gene families involved in stress and environmental response. Experiments are underway in our lab to investigate relationships between the environment, DNA methylation, and control of gene expression to better characterize this process. In-depth analysis of methylation patterns in Crassostrea gigas, will help to advance the field of evolutionary epigenetics and will serve to illuminate functions of DNA methylation in invertebrates.

Animal collection & DNA isolation
Oysters used in this study were collected from naturalized C. gigas populations in Puget Sound, Washington. To isolate genomic DNA, 25 mg of gill tissue was processed according to the manufacturer's protocol using the DNeasy Blood & Tissue Kit (Qiagen, CA).

Methylation Sensitive PCR
Oyster genomic DNA was enzyme digested with either HpaII or MspI. Five immune related genes containing one or more CCGG recognition sites and covering a broad range of predicted methylation status (based on CpGo/e) were selected from a set of ESTs generated from a cDNA library of plated hemocytes [52]. PCR primers were designed to flank one or more restriction sites. Primer sequences are provided in Additional file 2: Primer Sequences. Quantitative PCR was performed using digested (HpaII or MspI) and undigested samples using 1× Immomix Master Mix (Bioline USA Inc., Boston, MA), 2 uM SYTO-13 (Invitrogen, Carlsbad, CA) and 0.2 uM forward and reverse primers in an Opticon 2 System (Bio-Rad, Hercules, CA) with the following cycling conditions: 10 min at 95C, followed by 37 cycles of 15 sec at 95C, 30 sec at 55C, and 1 min at 72C and a final extension at 72C for 10 min. Results were scored qualitatively based on the presence or absence of amplification as determined by fluorescence.

Bisulfite Conversion and Sequencing
Genomic DNA was bisulfite converted using the Epitect Bisulfite conversion kit (Qiagen, Carlsbad, CA). Briefly, 1.75 ug of DNA was subjected to treatment with sodium bisulfite at increased temperature to deaminate unmethylated cytosine residues to uracil. Following treatment, the solution was desulfonated on a column, washed and eluted.
To identify methylated cytosines in expressed regions of the oyster genome, Meth Primer [53] was used to design primers that flank multiple CpG sites, but do not contain CpGs. Primer sequences are provided in Additional file 2: Primer Sequences. The mean expected amplicon length for bisulfite primers was~180 bp. PCR of bisulfite treated samples (54 ng/PCR reaction) was carried out using Taq DNA Polymerase Master Mix (Apex BioResearch Products, Research Triangle Park, NC) for 10 min at 95 C, followed by 40 cycles of 15 sec at 95 C, 30 sec at 55 C, and 1 min at 72C and a final extension at 72C for 10 min.
PCR products were separated using gel electrophoresis. Single bands were excised from the gel, purified using Ultra-DA purification columns (Ambion, Foster City, CA) and cloned using TOPO TA Cloning kit (Invitrogen). Four clones were sequenced for each primer pair. Methylation status was determined by comparing the sequence of bisulfite treated DNA to sequence of untreated DNA using Geneious 4.5.4 software (Biomatters Ltd., Aukland, NZ) and annotated using BLAST [54].

In Silico Analysis: Predicted DNA Methylation Status
For in silico analysis, the non-redundant C. gigas expressed sequence tag (EST) contig database, 'GigasDatabase' version 6 (http://public-contigbrowser. sigenae.org:9090/Crassostrea_gigas/index.html, [29]), was utilized. Analysis was limited to annotated sequences (n = 12,210) in order to be confident that sequences were reported in the 5' to 3'direction. It should be noted that this transcriptomic dataset is appropriate for predicting methylation status of the C. gigas genome as investigation into other invertebrate species shows that DNA methylation is specifically targeted to transcribed regions of the genome [17,12].
CpG observed/expected ratio (CpGo/e) was calculated using the following equation where l is the number of nucleotides in the contig: To evaluate the distribution of Pacific oyster contigs, a mixture model was fit to the CpGo/e ratios using the mixtools package [55] in R [56] yielding a two component mixture where p 1 + p 2 = 1. Hence the data C i , are distributed as: The log likelihood statistic of the bimodal mixture model was compared to the normal null model to test for a significant improvement in fit.
In order to evaluate the variation of CpGo/e within and among functional classes of genes, contigs from the Giga-sDatabase annotated with a biological process GO term (n = 10,699 contigs) were assigned a functional group based on the MGI GO Slim database http://www.informatics.jax. org [57]. Since each contig may have multiple GO terms, contigs were allowed to fall into multiple GO Slim bins. However, to avoid weighting within a single category, an individual contig was not allowed to be included more than once in a single GO Slim bin. The mean CpGo/e and standard errors were calculated for each GO Slim term. A oneway ANOVA followed by Tukey's test for multiple comparisons was performed using SPSS software (SPSS Inc., Chicago, IL). A significance level of p < 0.05 was accepted.

Additional material
Additional file 1: Matrix of p-values for comparisons between GO Slim categories based on CpGo/e. CpGo/e for GO Slim categories were compared with Tukey's multiple comparison test. This file contains the pvalues of each comparison. Significant differences (p < 0.05) are highlighted.
Additional file 2: Primer Sequences. This file contains primer sequences used for methylation sensitive PCR and bisulfite sequencing PCR analysis.