Sample origin
The 18 females used in this study were part of a larger study. For a detailed description of the experimental setup and sampling of that larger study see [39]. In short, 36 great tit pairs (18 early selection line pairs and 18 late selection line pairs in their second calendar year) that constitute the F2-generation of lines artificially selected for early and late timing of breeding [42, 43], were housed in 36 climate-controlled aviaries (2 m × 2 m × 2.25 m) at the Netherlands Institute of Ecology (NIOO-KNAW). Every climate-controlled aviary contained three nest boxes, a perch, fake tree, a food and water tray and bedding of wood chips. All great tits in this study decent from a wild long-term study population at the Hoge Veluwe National Park, The Netherlands (52°02′07″ N, 5°51′32″ E). Per selection line, pairs were formed randomly, but avoiding sibling pairings. Birds were subjected to a photoperiod mimicking the natural photoperiod and two contrasting temperature environments mimicking a cold spring (2013) and a warm spring (2014) in the Netherlands. Temperatures changed every hour to follow as closely as possible the observed hourly temperatures in these years. The combination of selection line and temperature environment resulted in four groups of n = 9 pairs within each group: ‘early-warm’, ‘early-cold’, ‘late-warm’ and ‘late-cold’. Within selection line, birds were randomly assigned to an aviary, but temperature environment (warm or cold) would alternate every aviary and selection line (early or late) every two aviaries, resulting in the order ‘early-warm’, ‘early-cold’, ‘late-warm’, and ‘late-cold’ throughout the 36 climate-controlled aviaries. Birds were fed ad libitum with food sources reported elsewhere [44] and had water available for drinking and bathing.
Although great tits normally only have one reproductive season per year, the pairs included in this larger study were induced to undergo two reproductive seasons [for details see 32]. In short, in the first breeding event from January until July, individuals were blood sampled bi-weekly [6, 16], and laying dates were obtained. Then, birds went through a period of short-day length (L:D 10:14) and low temperatures (10 °C) to induce gonadal regression and to make them photoreceptive and temperature sensitive again. Subsequently, birds received the same temperatures and photoperiods as in the first breeding event to induce a second breeding event that was initiated in autumn (September until November). The 36 pairs were divided into three groups taking into account the females’ laying dates in the first breeding season [39], and sacrificed at three time points throughout this second breeding event (see below).
Tissue collection and preparation
Based on the reproductive behavior during the first breeding event, three sampling time points throughout the second breeding season were chosen: (1) October 7 (resembling March 7) when photoperiod exceeded 11 h, which is necessary to initiate gonadal maturation [45], (2) October 28 (resembling March 30) when nest building occurred in the first breeding season, but prior to laying and (3) November 18 (resembling April 20) when 25% of the females had initiated egg laying in the first breeding event. Per time point both sexes of one group (n = 12 pairs) were sacrificed, although we focus on females only in the current study [39]. To sample birds, they were caught per pair from their aviary between 9:00 AM and 13:15 PM, taken to the dissection room and weighed (body mass (g) ± s.e.m. (range), males: 18.3 ± 0.2 (16.3–20.5) and females: 17.03 ± 0.17 (15.0–20.0)). Subsequently, taking into account the least amount of stress and highest sample quality, birds were anaesthetized deeply through inhalation of Isoflurane (vaporizer setting 2.5–3.0%), during which a blood sample (300 μl) was taken. RBCs were separated from plasma by centrifuging (10 min at 14,000 rpm) and stored in Queens buffer at room temperature before further analysis (see ‘Reduced representation bisulfite sequencing (RRBS)’ below). Following decapitation, tissues, including brain, ovary and liver were dissected and stored in − 80 °C until further processing. At a later stage, the hypothalamus, being the center for integration, transduction and translation of environmental cues, was isolated from the rest of the brain and, until further processing, stored in − 80 °C.
The samples that we used in this study are from the early selection line females in the second (autumn) breeding season (n = 18, with 6 females per sampling time point) because during this sampling event blood, hypothalamus, ovary and liver where collected as opposed to the first breeding event where only blood was sampled.
Reduced Representation Bisulfite Sequencing (RRBS)
We extracted DNA from RBCs stored in 250 μl Queens buffer (with approximately 10–20 μl of RBCs per 1 ml) using the DNeasy kit (Qiagen) and from 25 mg liver with the MagAttract kit (Qiagen) according to manufacturer’s protocol. To produce Reduced Representation Bisulfite Sequencing libraries, the preparation protocol according to manufacturer’s protocol (Illumina) was used with some changes [46]. Briefly, samples were digested using the restriction enzyme MspI and the resulting DNA fragments of various size were subsequently bisulfite treated, which converts un-methylated cytosine bases into uracil bases, whereas methylated cytosine bases are resistant to the treatment. Fragmented and bi-sulfite treated DNA was then end-repaired with DNA polymerase I and A-overhangs were added to the 3′ ends of each fragment for adapter ligation. Individual sample libraries were barcoded using standard Illumina adapters. Libraries were purified, size selected with Ampure XP beads (Beckman Coulter) and concentrations were determined by quantitative polymerase chain reaction (qPCR). This selection yielded a fragment size range of approximately 30–180 base pairs, with a mean of 85. Six libraries were pooled into the same sequencing lane (Additional file 41; Table S22). Each pool was sequenced 100 bp single end (Additional file 41; Table S22) on a HiSeq2500 sequencer with a HiSeq SBS sequencing kit version 4 (Illumina). Sequencing was conducted in two separate HiSeq runs to yield enough coverage per sample. An internal positive control (PhiX) was used to obtain reliable sequence generation in the sequencing processing and the PhiX reads and adapters were removed before data analysis. Library preparation and sequencing were performed at the SciLife Lab, Uppsala University, Sweden.
Sequence read quality and alignment
Sequencing read quality was investigated with the FastQC 0.11.7 quality control tool [47]. Low quality bases as well as Illumina adapter contamination resulting from read-through of short fragments were trimmed using Trim Galore! v0.4.4 [48] with default parameters under the –rrbs mode. This mode disregards the first five base pairs in the 5′ to reduce calling of false positive methylation as a result of bisulfite treatment. Each sample’s reads from both of the sequencing runs were combined together for alignment. Trimmed sequencing reads were aligned against a bisulfite converted version of the Parus major reference genome v1.1 (https://www.ncbi.nlm.nih.gov/assembly/GCF_001522545.2) using Bismark 0.19.1 (Bioinformatics Group. Babraham Institute) aligner in rrbs mode. The reference genome contains all assembled chromosomes as well as all scaffolds. After alignment and CpG site calling we selected the sites with a minimum coverage of 10x across all samples within a tissue (RBCs and liver) for further analyses. We calculated the methylation proportion for a site in the respective sample as the proportion of methylated counts relative to the total read counts. As we were interested in sites that change over time, we excluded all sites that showed a methylation proportion of either zero or one across all samples from downstream analyses.
Gene annotation
CpG sites were annotated, using R packages ‘GenomicFeatures’ [49] and ‘rtracklayer’ [50], to different genomic locations: TSS region (300 bp upstream - 50 bp downstream of the annotated transcription start site), promoter region (2000 bp upstream - 200 bp downstream of the annotated transcription start site), gene body (exons and introns), and 10 kb up- and downstream regions (10 kb regions adjacent to the gene body, respectively). Each identified CpG site was assigned to one of the above specified genomic regions (and the gene annotated to that region) with BEDtools v.2.26.0 [51]. See Additional file 42; Table S23 for an overview on how many CpG sites were covered per genomic location in the RBCs and liver data and how many genes were associated to the CpG sites within a respective genomic region and tissue. Earlier studies in great tits have shown that methylation levels surrounding the TSS and within promoter regions best associate with RNA expression [21, 24]. Hence, only CpG sites in the TSS or promoter region of annotated genes were used for exploring (i) tissue-general and tissue-specific changes in DNA methylation between RBCs and liver and (ii) the correlation between change in methylation and candidate gene expression in liver (qPCR, see below) correlation. CpG sites within the TSS regions, promoter regions, gene body, and 10 kb up−/downstream regions were used for exploring (iii) genome-wide associations between changes in methylation and changes in gene expression in liver, hypothalamus, or ovary (Fig. 4).
RNA extraction, real-time quantitative polymerase chain reaction and sequencing
From the same females, for which we acquired DNA methylation patterns, we used already available qPCR and RNA-seq data generated by two other studies [22, 39]. In short, RNA was isolated from hypothalamus, ovary and liver by Trizol extraction and reverse transcribed into cDNA [39].
qPCR
In a previous study [39] primer pairs were built based on the Parus major reference genome v1.1. and Parus major annotation release 101 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Parus_major/101/) for a list of candidate genes with known and unknown functions in avian reproduction and checked for specificity using a BLAST search. Efficiency of each primer pair was determined by a 5-point standard curve of cDNA samples. Relative transcript levels were measured within hypothalamus, ovary and liver by real-time qPCR using the SYBR Green method followed by fluorescence measurements and analyses to obtain cycle thresholds. Expression levels of the candidate genes were normalized against reference genes. The combination and number of reference genes differ per organ and can be found elsewhere [39].
RNA sequencing
In a previous study, genome-wide expression patterns were measured in pools (n = 12) of three female great tits from both the early and late selection line [22]. This resulted in four pools per time point, of which every pool represented a selection line × treatment combination. Since the 18 females from the current study originate from the early selection line, we used the RNA-seq data from the pools with early selection line birds (n = 2 pools per time point with n = 3 females per pool). Briefly, libraries were prepared using the Illumina TruSeq strand-specific mRNA method (Illumina, San Diego, CA, USA) and one lane of Illumina HiSeq 2500 (single-end 50 bp) for 12 pools. Reads were filtered for low quality. Subsequently, trimmed reads were mapped to the Parus major reference genome v1.1, after which transcripts were assembled based on the Parus major annotation release 101. Unique reads that mapped to transcripts were counted.
Statistical analysis
All statistics and plotting were performed using R version 3.5.2 [52]. An overview of how the different data sets and tissues are linked is provided in Fig. 4.
Tissue-general and tissue-specific changes in DNA methylation between red blood cells and liver
Prior to correlating the change in methylation over time between tissues, we tested whether methylation levels required standardization in order to meet the requirements to conduct such a correlation. For each sample we calculated the mean and variance in methylation proportion across all CpG sites within a sample and tested for a difference in mean and variance between samples with a Kruskal-Wallis test (p < 0.05, n = 18 for liver and RBCs, respectively) and Fligner Killeen test (p < 0.05, n = 18 for liver and RBCs, respectively), respectively (Additional file 43; Table S24). Because samples significantly differed in mean methylation and the variance of methylation proportion, we corrected the methylation proportion of individual CpG site for both differences. This was done by calculating z-scores; i.e. subtract the mean methylation proportion over all samples within the respective tissue from the methylation proportion of each individual CpG-site and divide this by the standard deviation of the methylation proportion over all samples within the respective tissue. We used the z-scores to calculated the change in CpG site methylation between time points within liver and RBCs.
We conducted a differential methylation analysis, using the ‘methylKit’ package [52], on the raw count data of 302,647 CpG sites that were common for both the blood and liver data, in order to find DMS between time point 1 and 2 (Δ1,2), and time point 2 and 3 (Δ2,3) in either blood or liver (tissue-specific change) or in both tissues (tissue-general change) (Additional file 9; Table S2, Additional file 10; Table S3). We considered a site significantly differentially methylated when the difference in methylation between time points ≥ 15% and a q-value ≤0.01.
We used the Pearson’s correlations coefficient (r) to evaluate the relationship between DNA methylation in blood and liver, as it measures linear trends. We repeated this for sites that were situated in the promoter and TSS regions of genes in both RBCs and liver.
Additionally, a gene ontology (GO) analysis was performed using the genes that carried the tissue-specific and tissue-general changing DMS to explore which functional groups (GO terms) are over-represented [53, 54] and possibly linked to timing of breeding. We divided the gene sets from the two time point comparisons into three groups for DMS located in the gene body, TSS region and/or promoter region. GO analysis was performed using Cytoscape plugin ClueGo 2.5.7 [55]. Using kappa statistics, ClueGo constructs and compares networks of GO terms. A two-sided hypergeometric test [56] was applied, Kappa score and network specificity were kept at default values. The GO term/pathway selection was at 5% and false discovery correction was performed using the Benjamini-Hochberg step-down method [57]. We used human (11.05.2020) gene ontologies and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [58] with three background gene set lists, which were all the genes covered by filtered CpG sites (15,103 genes), TSS regions covered by CpG sites (5731 genes) and promoter regions covered by CpG sites (9816 genes).
Correlation between change in methylation and candidate gene expression in liver
We selected those sites both in the TSS and promoter regions within genes in liver that were either key to reproductive functioning (i.e. in relation timing of breeding) or within the reference genes (i.e. to normalize qPCR expression data) (Additional files 44 and 45; Tables S25 and S26), and for which there was also qPCR gene expression data available [33, Additional file 46; Table S27]. In order to evaluate the association between DNA methylation changes and RNA expression changes in the TSS region, we found CpG sites with 10x coverage across all samples for five from the total candidate gene set analysed in liver [39]: beta-2-microglobulin (B2M), glucocorticoid receptor (GR), heat shock protein family B (small) member 1 (HSPB1), mineralocorticoid receptor (MR) and protein 2 kinase C alpha (PRKCA). In addition to these genes, two more genes, ribosomal protein 19 (RPL19) and succinate dehydrogenase complex flavoprotein subunit A (SDHA), could be evaluated for promoter regions. Per gene, we calculated Δ1,2 and Δ2,3 for both expression and methylation levels. For example, from the methylation level of an individual female in time point 2, methylation levels of all females in time point 1 (n = 6) were subtracted. Subsequently, these six values were used to calculate the average change in methylation per female in time point 1 across all females from time point 2, and vice versa (see Additional file 47; Fig. S20 for a visualization). The same process was repeated for the expression levels. Pearson’s correlations were used to evaluate relations between the average change in expression and average change in methylation levels. P-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure [57].
Genome-wide associations between change in methylation and gene expression
Here, we used RRBS data of individual females (n = 6 females per time point) and RNA-seq data of pools of the same females (n = 2 pools per time point with n = 3 females per pool) to relate changes in CpG site methylation to changes in expression of the associated gene. We examined how (a) the change in liver methylation related to the change in liver gene expression and how the change in RBC methylation related to gene expression change in (b) liver, (c) ovary, and (d) hypothalamus (Fig. 4). Four this, we used (a) 529,717 CpG sites in the liver that were located within 14,982 genes in the liver, (b) 297,916 CpG sites in RBCs that were located within 13,893 genes in liver, (c) 14,708 genes in the ovary, and/or (d) 14,570 genes in the hypothalamus. To associate DNA methylation changes with gene-expression changes, CpG sites that showed a time-point effect were identified using a differential methylation analysis with time-point (levels 1,2 and 3) as a fixed factor, and genes with a significant time effect were identified using a differential gene expression analysis performed in [22]. This in contrast to the analysis of i), where we assessed changes for both time point 1 and 2 (Δ1,2), and time point 2 and 3 (Δ2,3) separately. We therefore end up with different numbers of significant sites for these two analyses.
Differential methylation analyses were performed for 529,717 CpG sites in liver and 297,916 CpG sites in RBCs using the ‘methylKit’ package [52]. To test the significance of a time-point effect, we used a model comparison approach to test whether the full model, including time point and temperature environment as fixed factors, explained the methylation profile of a site better than the null model only including the temperature environment as fixed effect. We considered a time effect to be significant for sites with q-value ≤0.01 for both tissues. This likely leads to a more stringent correction for multiple testing in liver as the number of sites tested was much higher than for RBCs. Differential expression analysis is described in detail in [22]. In short, main effect models for time point and selection line were tested using the standard DeSeq2 protocol [59] and a likelihood ratio test such that the main effect models were compared to a model excluding the main effect. Models were performed separately for each tissue. Genes present in the trimmed RNA-seq data sets for liver, hypothalamus, and ovary with adjusted p < 0.05 when testing the main effect model for a time point effect and with adjusted p > 0.05 when testing the main effect model for a selection line effect (as data from both selection lines were included when testing the main effect models) based on [22], were classified as genes that significantly changed in expression over time. Thereafter, to examine the association between DNA methylation change and change in gene expression in tissue comparisons a-d, we quantified the change between time point 1 and 2 (Δ1,2) and time point 2 and 3 (Δ2,3) for both the methylation level of CpG sites and gene expression levels. We quantified the change in methylation level, by first calculating the average methylation levels (i.e. methylation proportion × 100) per CpG site across females for all three time points and then calculated the difference between the respective time points per CpG site. We quantified the change in gene expression between time point 1 and 2 (Δ1,2), and between time point 2 and 3 (Δ2,3) separately, by calculating the log2Foldchange contrast using DeSeq2 [59]. We furthermore trimmed the data sets by excluding CpG sites with a change in methylation level < 5% methylation (since absolute methylation levels are lower in TSS regions) and genes with a change in expression (as log2Foldchange) < 0.5 for any of the two time-point contrasts. To better understand the effect of the genomic location on the relationship between changes in DNA methylation and gene expression, we differentiated between genomic locations (i.e. TSS region, promoter region, gene body and 10 kb up−/downstream region, see section ‘Gene annotation’, above). For each combination of comparison (a-d), time contrast (first and second) and genomic location, we plotted the gene expression as log2Foldchange against the change in methylation level. There are four possible quadrants of association between change in gene expression and change in methylation level: hypo-methylation and increased gene expression (Q1), hyper-methylation and increased gene expression (Q2), hyper-methylation and decreased gene expression (Q3), and hypo-methylation and decreased gene expression (Q4). While Q1 and Q3 would relate to changes in the predicted directions (based on the expectation that methylation and expression are negatively correlated), Q2 and Q4 would relate to changes opposite to the predicted directions. For the correlation between change in RBC methylation and change in gene expression in ovary, we tested whether CpG sites that were situated in the TSS region were enriched in quadrants Q1 or Q3 compared to quadrants Q2 or Q4 using a Fisher’s exact test, in which we compared the proportion of associations within quadrants Q1 or Q3 between the TSS and 10 kb downstream region. We used the 10 kb downstream region as a control region for CpG sites randomly distributed across Q1-Q4 as we do not expect any relationship between DNA methylation changes and gene expression changes in this region [21, 24]. We did not use the 10 kb upstream region or gene body as control regions as the 10 kb upstream region overlaps with the promoter region (in which we would rather expect a relationship between methylation and gene expression) and at least parts of the gene body are hypothesized to show a relationship between methylation and gene expression.