Biased gene expression in early honeybee larval development

Background Female larvae of the honeybee (Apis mellifera) develop into either queens or workers depending on nutrition. This nutritional stimulus triggers different developmental trajectories, resulting in adults that differ from each other in physiology, behaviour and life span. Results To understand how these trajectories are established we have generated a comprehensive atlas of gene expression throughout larval development. We found substantial differences in gene expression between worker and queen-destined larvae at 6 hours after hatching. Some of these early changes in gene expression are maintained throughout larval development, indicating that caste-specific developmental trajectories are established much earlier than previously thought. Within our gene expression data we identified processes that potentially underlie caste differentiation. Queen-destined larvae have higher expression of genes involved in transcription, translation and protein folding early in development with a later switch to genes involved in energy generation. Using RNA interference, we were able to demonstrate that one of these genes, hexamerin 70b, has a role in caste differentiation. Both queen and worker developmental trajectories are associated with the expression of genes that have alternative splice variants, although only a single variant of a gene tends to be differentially expressed in a given caste. Conclusions Our data, based on the biases in gene expression early in development together with published data, supports the idea that caste development in the honeybee consists of two phases; an initial biased phase of development, where larvae can still switch to the other caste by differential feeding, followed by commitment to a particular developmental trajectory.


Supplementary results and figures
Quality control analysis for microarray, HTS and RT-qPCR Microarrays are valuable tools for gene expression analysis, generating expression data for thousands of genes in a single experiment. They do however have some pitfalls. High levels of background hybridisation can reduce the range over which microarrays can accurately detect changes in gene expression [1]. Furthermore microarray probes can differ in their hybridisation properties and in some cases cross-hybridisation can occur. Finally errors can also be introduced during data analysis. This is especially true if a multiple testing correction is not used. A multiple testing correction was not performed in this analysis as performing any type of multiple testing correction eliminated all candidate genes. In this case it was essential to use other methods to validate the genes identified by the microarrays.
RT-qPCR is considered to be the gold standard for gene expression analysis [2], so it was used to confirm differential expression of a selection of candidate genes identified by the microarrays. Supplementary figure S1A. indicates the relative expression values for the candidate genes. Genes that were significantly differentially expressed between queens and workers are marked with asterisks. Of a total of 39 genes, 16 were confirmed to be significantly differentially expressed between queens and workers. A further two genes were previously published [3] bringing the total to 18 out of 41 genes (44%). This analysis indicates that less than 50% of the genes identified by the microarrays may actually have a role in caste development.
A second quality control analysis was performed using the microarray, HTS data and RT-qPCR data. A correlation analysis was performed by selecting genes that had a two-fold or greater difference in expression between the two castes. The spearman's correlation coefficient (R) was used to determine the strength of the correlation between each of the data sets. Supplementary figure S1B. shows a correlation coefficient of 0.65 between the microarray and RT-qPCR data. The HTS and RT-qPCR have the strongest correlation coefficient of 0.96 (Supplementary figure S1C.) and the HTS and microarray data have a correlation coefficient of 0.85 (Supplementary figure S1D.) The low R value between the microarray and RT-qPCR data may be a result of the reduced dynamic range of microarrays. Microarray platforms have a tendency to underestimate fold change compared to RT-qPCR [1,4]. Our correlation results are similar to previous findings which obtained R values of 0.71 when comparing microarray and RT-qPCR data [5] and 0.73 when comparing microarray and HTS data [1].
Although only 44% of the candidate genes confirmed with RT-qPCR the correlation between data sets was good. This indicates that while single genes may not confirm the data set as a whole is satisfactory. This justifies the use of data from the microarrays for large scale analyses such as pathway, GO analysis or CpG o/e analysis. Individual genes will require confirmation with RT-qPCR to confirm they are differentially expressed before being investigated in functional studies. Queens Workers candidate genes identified by microarray analysis and tested with RT-qPCR. Genes are identified by GB numbers followed by the time-point where expression was tested in brackets. Genes that showed a significant difference in expression between queens and workers when tested with RT-qPCR are marked by asterisks. * P < 0.05, ** P< 0.01 and *** P < 0.001. (B) Correlation between microarray and RT-qPCR data. (C) Correlation between HTS and RT-qPCR data (D) Correlation between HTS data and microarray data. Normalised expression ratios were used to calculate Spearman's correlation coefficient (R) and associated P-value for each correlation.

RNAi phenotypes
Individuals that emerged after injection with dsRNA against hexamerin 70b or egfp were classified as either queen-like or workers based on the presence of absence of a number of characteristics. Adult queens have claw-like mandibles whereas the mandibles of adult worker bees are smooth (see Supplementary figure S2). Queen-like individuals had varying degrees of curvature in their mandibles. This variation correlates well with developmental time. Individuals that had the shortest developmental time had stronger queen-like features. The photos are arranged with the most queen-like individuals in the top row and individuals with less curvature in the last row.
The legs were also used to classify individuals as queens or workers. The tibia and tarsus on the hind legs are the most phenotypically useful as the tibia of workers contains the corbicula and the tarsus has thick bristles neither of which are present on the legs of queens. Supplementary figure S3 shows the legs of queen-like and worker individuals. Queen-like legs are covered with fine hairs and do not show evidence of the corbicula or bristles. These photos are arranged in the same way as the mandibles with individuals with the strongest queen-like phenotype at the start and the last two photos in the bottom row showing features of both queen and worker legs. These features are highlighted by red circles. The first individual has bristles on one leg but there is no evidence of corbicula formation on the other leg. The last individual shows some evidence of corbicula formation with the area circled in red lacking hair but the rest of the leg has hair on it resembling a queen-like leg.
Another key phenotypic marker of queens is their large ovaries. Supplementary figure S4 indicates examples of queen-like ovaries and worker ovaries. As above photos are arranged to show the individuals with the most queen-like phenotype first. Queen-like individuals have large ovaries with hundreds of ovarioles. Workers tend to have smaller ovaries with less than 10 ovarioles. There is some variation in worker ovary size as indicated by the last two individuals.
The final feature used to classify individuals was the presence or absence of the spermatheca. Supplementary figure S5 shows examples of spermathecas found in queen-like individuals. Workers either did not have spermathecas or they were very small.
To be classified as queen-like individuals were required to resemble queen-like individuals in three of the four characteristics. Figure S2: Examples of mandibles from individuals classified as queen-like or workers after RNAi.

Queen phenotypes -Legs
Worker phenotypes -Legs Figure S3: Examples of hind legs from individuals classified as queen-like or workers after RNAi. The red circles indicate areas that differ from the expected phenotype. The first example shows an individual with worker like bristles on the tarsus but no evidence for the corbicula on the tibia. The second individual shows some evidence of corbicula formation (the area without hair) but the rest of the tibia is queen-like.

Queen phenotypes -Ovaries
Worker phenotypes -Ovaries Figure S4: Examples of ovaries from individuals classified as queen-like or workers after RNAi.
In the genomes of animals with DNA methylation deamination of methylated cytosine residues over evolutionary time leads to a set of genes that are depleted in the CpG content -and were historically methylated and a set of genes that are not depleted (not historically methylated). Plotting a frequency graph for the CpG[o/e] values for every gene in the genome of an animal with DNA methylation produces a bimodal distribution (actually a mixture of overlapping normal distributions), while doing the same analysis for an animal that lacks DNA methylation produces a unimodal distribution [6]. We analyzed differences in the CpG[o/e] frequency distributions of our differentially expressed genes using mclust model based clustering [11] and statistical significance between the distributions was assessed using the Fisher's exact test. A difference plot summarizing the deviations from the expected distribution for queen and worker larvae at each point during development is shown in Supplementary figure S6H. In late development (108-132 hours), genes more highly expressed in queens have lower CpG[o/e] ratios, but genes more highly expressed in workers fall into a unimodal distribution (means 0.42 and 0.96 for queens, 0.84 for workers, Supplementary figure S6E).  Analysis of our high-throughput sequencing data, from 60 hr of development, yields results consistent with our array data. All the genes detected give a bimodal distribution with means at 0.59 and 1.23 (Supplementary figure S6F) and examining differences between worker and queen transcripts shows is consistent with, but more pronounced than, our microarray derived data. 50% of queen transcripts fall into the low (mean 0.58) distribution, and 50% in the high distribution (mean 1.01), while in workers, 17% fall in a low distribution, but with a higher mean that queens (0.66), and the remainder in a high distribution with a mean of 1.27 (Supplementary figure S6G).

GC content and evolution
Genes with high GC content can occur due to biased gene conversion owing to recombination [12] and previous studies have shown that such genes have high rates of molecular evolution and are associated with worker-biased gene expression [13,14]. Analysis of the GC content of genes differentially expressed between castes during larval development (Supplementary figure S8) reveals that at the majority of time points both castes are biased towards differential expression of genes with a low GC content, and lower rates of sequence evolution [13]. At 12 hours and 84 hours of queen larval development, genes that are more highly expressed have a higher than expected GC content (Supplementary figure S8). In contrast genes that more highly expressed at 60 hours (as detected by HTS) have a higher than expected GC content in worker-destined larvae. This may indicate that at these specific time-points, genes that are expressed in a caste specific manner are faster-evolving, and may be important for the evolution of new traits associated with caste development.

RNAi and larval rearing
Larval injections were performed with a needle created from borosilicate capillary glass tubes (Warner Instruments). Needles were placed in a Singer MK1 Micromanipulator (Singer Instruments) and a PLI-100 Pico Injector (Harvard Apparatus) was used to deliver the injections at a pressure of 60 kPa. All larval manipulations were performed in a heated and humidified chamber. L2-L3 larvae were removed from the comb and injected with 25 ng of dsRNA. Larvae were then placed in 24 well cell culture plates (Griener), each well containing 200 µl of larval diet (18% fructose (D(-) Sigma-Aldrich), 18% glucose(D(+) Sigma-Aldrich), 3% granulated yeast (Sigma-Aldrich) solution (60%) and royal jelly (Exportim, Australia) (40%). The cell culture plates were stored in an incubator at 34°C with a saturated solution of potassium sulphate. Larvae were fed daily with 50 -100 µl of larval diet and dead larvae were removed. When larvae stopped feeding they were transferred to 60 x 15 mm Petri dishes lined with nylon mesh. On emergence the phenotype of each individual was characterised as queen-like or worker based on the shape of the mandibles, presence of hair on the legs, number of ovarioles and the presence of a spermetheca.

RT-qPCR
1 µg of RNA from the 12 hr, 60 hr, 84 hr and 108 hr time-points was used to make cDNA using Superscript VILO TM according to manufacturer's instructions. Oligonucleotide primers were designed using Primer3 [15] and Amplify [16] to span introns (where possible). The primer set and amplification efficiency for each gene is supplied in supplementary table 1. Quantitative RT-PCR, normalization and data analysis was performed as described in Cameron et al. 2013 [3].

Statistical analyses
Fisher's exact and χ 2 tests were performed using R, which is a statistical analysis and graphics software environment (R Development Core Team 2008). P values < 0.05 were considered to be significant.
R was also used to determine the Spearman's correlation coefficients for the quality control analysis (Supplementary figure S1B-D). Log expression ratios were obtained for genes that showed a two-fold change in expression between the castes in both the microarray and HTS data sets. Normalised expression ratios were calculated for the RT-qPCR data using the Pfaffl method as mentioned previously.
The expression ratios from each data set were graphed as linear regression plots using Matlab ( [17]). As very few genes were tested with RT-qPCR at the 60 hr time-point, genes from the surrounding time-points (84 hr and 108 hr) were compared to the 60 hr HTS data in the correlation analysis. The technologies were compared in pairwise analyses in R to determine the Spearman's correlation coefficient and P value.

CpG[o/e] and GC content analysis
CpG[o/e] values for the microarray data were calculated using the sequence data file (Official GeneSet Array.txt) [18]. The CpG[o/e] values for the HTS data were calculated using sequence data from the coding regions from A. mellifera genes (Amel 4.5, NCBI). Sequence statistics were generated using the Geomatrix software suite [19]. CpG[o/e] values were calculated from these statistics using the calculation described by Elango et al. (2009) [6]. The distributions of CpG[o/e] within honeybee genes can be described as a mixture of overlapping normal distributions [6]. The number of components in these distributions was estimated in R (www.r-project.org) using mclust [11] model-based clustering. The best fitting model was identified among several non-nested models using Bayesian information criteria (BIC). The Fishers exact test was used to determine whether there were differences between samples in the numbers of genes classified under each of the distributions identified in mclust. GC content was calculated from the nucleotide composition of honeybee coding genes. Genes with a GC content of > 38% were classified as GC rich and genes with a GC content of < 38% were classified as GC poor [13]. The occurrence of GC rich and GC poor genes within our differentially expressed gene lists was compared with the expected occurrence based on GC content of all the genes on the microarray or detected as expressed by high throughput sequencing using a χ 2 test.