In order to investigate the natural variation of RNA levels within a species at stages of embryogenesis controlled by maternal and zygotic genomes, we sequenced embryonic transcriptomes from different D. melanogaster populations. Single embryos were collected at a stage in which all RNA has been maternally provided (Bownes’ stage 2, [14]), and another stage after zygotic genome activation (late stage 5; or end of blastoderm stage). In order to maximize genetic diversity, we chose four lines from Siavonga, Zambia [15] and four Drosophila Genetics Reference Panel (DGRP) [16] lines from Raleigh, North Carolina. Three biological replicates were sequenced per line and stage. An average of 2.83 and 2.89 million high-quality 100 bp paired-end reads were mapped to the same reference D. melanogaster genome from the Zambia and Raleigh lines, respectively. Hierarchical clustering of the transcriptomes resulted in samples clustering initially by stage then by population, with the exception of one Raleigh line whose stage 5 sample fell outside the three other stage 5 Raleigh samples (Fig. 1A). When we included transcriptomes from an outgroup, D. simulans, which share a common ancestor ~ 2.5 MYA with D. melanogaster [17], to the clustering, the D. simulans samples clustered by stage with, but outside of, the D. melanogaster transcriptomes (Fig. 1A). Principal component analysis also separates individual lines by stage with the corresponding principal component (PC1) representing nearly 80% of the variation (Fig. 1B).
Expression variation differs within populations
To explore the patterns of variation in the maternal and zygotic embryonic transcriptomes within and between populations of D. melanogaster, we performed differential expression (DE) analysis on our transcriptomic dataset. First, we asked how many genes are differentially expressed within each population, Zambia and Raleigh, at maternal and zygotic stages of development. To do this, we implemented a likelihood ratio test in DESeq2. We normalized our differential expression results to numbers of genes expressed (see Methods) at each stage in order to compare proportions of genes differentially expressed between stages. We found that overall, there are more differentially expressed genes at stage 5 than at stage 2 within both populations (Fig. 2A). This is consistent with previous findings between species that zygotic gene expression evolves faster than maternal gene expression [6]. Strikingly, there are many more differentially expressed genes at both stage 2 and stage 5 within the Zambia population than between Raleigh lines. Total number of DE genes within these populations were 3174 and 2512 at stage 2, respectively. At stage 5 total number of DE genes was 4723 within the Zambian population and 3648 within the Raleigh population.
We asked if there were similarities in the identity of genes with differential expression within populations at the two stages. A proportion of genes were found to be differentially expressed within both populations at stage 2 and stage 5 (Fig. 2A). Of all the DE genes at stage 2 combined, 43% were only DE within Zambia and 28% within only Raleigh lines, while 29% of genes were DE in both (Fig. 2B). At stage 5 the percent of genes only DE within the Zambia lines stayed relatively similar at 39% whereas the percentage of genes only varying expression within the Raleigh population was lower at 20%, due to a higher proportion of genes in both (at 41%; Fig. 2B). Thus, the percentage of genes varying in expression levels in both populations is higher in stage 5 than stage 2. Therefore, there is a common set of genes that vary in transcript levels within both populations in addition to a unique set of genes that vary only within the respective populations, and these vary by stage, with more shared differences at stage 5.
Differences in the magnitude of expression variation within populations
With more genes differentially expressed within the Zambia population than the Raleigh population, we asked if the magnitude of expression changes were similar between populations. To do this, we used the maximum and minimum expression value for each differentially expressed gene within the populations. From this, we computed the log ratio of the fold change for each DE gene. We then asked if the distribution of the log ratio of fold changes for DE genes were different between the two populations at either stage (Fig. 2C). There is no significant difference between the means of log ratio of fold changes when comparing stage 2 between populations (t-test, p = 0.9109), thus no evidence that the magnitude of transcript abundance changes is different between populations. There is, however, a significant difference between the means of the log ratio of fold changes between the two populations at stage 5 (t-test, p = 7.278e-06) with a higher magnitude of fold changes within the Raleigh population. Therefore, although there are fewer genes differentially expressed within the Raleigh population at stage 5, the magnitude of these differences are on average higher than the genes differentially expressed within the Zambia population at this stage.
More differences within populations than between populations at maternal and zygotic stages
Next, we asked if there were fixed expression differences between the populations. We define fixed expression differences as genes that are on average higher, or lower, in one population than the other (i.e. have similar levels in all lines from a population, that are significantly different than all the lines in the other population; see Fig. 3A for examples). We used the Raleigh lines and the Zambia lines as replicates in DE analysis. Similar to the expression variation within populations, the percentage of genes that were differentially expressed between populations increased from stage 2 to stage 5 (Fig. 2A). We find that there are more genes differentially expressed within populations than fixed expression differences between the populations at both stages (Fig. 2A). We find 700 and 1325 genes differentially expressed between populations at stage 2 and stage 5, respectively.
In addition to finding fixed expression differences, we asked how many genes were differentially expressed between individual lines. Genes differentially expressed between lines from different populations in the pairwise analysis represent differences only between the two lines in the comparison, rather than fixed expression differences between the two populations as in the previous analysis. This resulted in DE analysis between every pair of lines resulting in 28 of total comparisons. 12 DE tests between lines of the same population (RR and ZZ), and 16 DE tests between lines of different populations (RZ) (Fig. 2D). Since there are fewer tests between lines of the same population than between lines of different populations we used bootstrapping in order to compare the average number of DE genes between these categories. Similarly to the previous within population analysis, there are fewer DE genes between individual Raleigh lines (RR) than Zambia lines (ZZ), at both stages (Fig. 2D). We find that the average pairwise differences between lines (RZ) of different populations at stage 2 was not significantly different (p = 0.06972; Wilcoxon rank sum test) than the average pairwise differences between Zambia lines (ZZ) at this stage (Fig. 2D). However, at stage 5, the average number of differences between lines of different populations are higher relative to the number of differences between Zambia lines (p < 2.2e-16; Wilcoxon rank sum test). Therefore, there is as much variation of expression between individual Zambia lines at stage 2 as between individual lines from different populations at this stage. In contrast, variation between individual lines from different populations at stage 5 surpasses the differences between individual Zambia lines at this stage.
More expression variation between than within species
Expanding our analysis, we investigated gene expression variation within and between species of Drosophila at maternal and zygotic stages. In a previous study, we generated RNA-seq data from D. simulans, D. sechellia, D. yakuba, and D. erecta from stage 2 and stage 5 embryos using the same single embryo RNA extraction method implemented here. We chose these two pairs of sister species as they are closely related, but one pair (D. simulans and D. sechellia) diverged more recently, ~ 250,000 years ago, [17] than the other pair (D. yakuba and D. erecta, estimated 8 MYA divergence time [18, 19]. RNA-seq reads from these species were processed identically to the D. melanogaster reads for this analysis (see Methods). Genes considered in this analysis were limited to one-to-one orthologs across the 5 species, a total of 12,110 genes. As we had only one line for each of the other species, we performed the DE analysis pairwise for each of our D. melanogaster lines, as well as between each pair of sister species. The number of DE genes in each population or species was normalized to the number of genes transcribed at each stage to compare the percentage of DE genes at both stages and across species. From these comparisons, within and between species, there are more differentially expressed genes at stage 5 than stage 2 (Fig. 4). For maternal genes, the more closely related species pair between the two species pairs, D. simulans and D. sechellia have the highest proportion of DE genes. While most D. melanogaster lines have fewer differences than either of the species comparisons at this stage, two of the Raleigh vs. Zambia comparisons have as high of a proportion of their maternal genome differentially expressed as the more distantly related species pair, D. yakuba and D. erecta. For stage 5, both species pairs have a larger proportion of their transcripts DE than any of the within-species comparisons of D. melanogaster. Both stages have, on average, fewer genes DE for within-species comparisons than between species, but this pattern is much stronger for stage 5, a stage with more genes DE in all comparisons.
Enrichment of DE genes at maternal stage on the X chromosome
Before the zygotic genome is activated, embryonic development is entirely under control of maternal gene products. Therefore, all stage 2 transcriptomes are supplied entirely by XX genomes and the zygotic genome is transcribed by either XX or XY genomes. Given the possibility of different evolutionary pressures, we asked whether there is a difference of enrichment of DE genes on the autosomes or X chromosome across maternal and zygotic stages. For our stage 5 transcriptomes, these were collected from XY embryos, so they are directly comparable. As onset of Drosophila dosage compensation occurs sometime after stage 5, collecting a single sex is necessary at this stage [20, 21, 22]. We normalized the number of DE genes per chromosome by the number of genes expressed on each chromosome.
We found that DE genes at stage 2 between populations were enriched on the X chromosome compared to the autosomes (Fig. 3B). However, enrichment of DE genes on the X chromosome is absent at stage 5 in our samples. Maternal transcripts are not completely degraded by stage 5, so we also asked if the trend seen for all of stage 5 transcripts were the same for transcripts that are zygotic-only. As expected, fixed expression differences between zygotic-only genes were not enriched on the X chromosome (Fisher’s exact test, p < 0.05) having the same result as all genes at stage 5.
The most differentially expressed genes are genes with known selection signatures
A number of the most differentially deposited transcripts between populations are genes that have been shown previously to have signatures of selection at the level of the genome under different conditions. For example, a previous study found that genes within the chemosensory system have undergone local adaptation following D. melanogaster’s global expansion out of Africa [23]. This study was based on the genomes of five different geographically distinct populations of D. melanogaster including both North American and African populations. Notable within the top ten most DE maternally deposited genes between populations is Gstd9, a glutothione-S-transferase, which belongs to a gene family that was found to have signals of selective sweeps upon global expansion [23]. In total, seven glutothione-S-transferases were found to be differentially deposited between the Raleigh and Zambia populations. In the same study [23] the zinc finger protein family was shown to have strong population differentiation. Zcchc7, a zinc-finger protein, is also among the top ten most differentially deposited transcripts. These two genes both have undergone dramatic qualitative changes in maternal deposition (Fig. 3A, Fig. 5A and Supplemental Table 1).
The second most significantly differentially deposited transcript is the actin binding protein Unc-115a. The paralog of this gene, Unc-115b, was also found to be differentially expressed between populations. Both of these genes have higher expression levels in the Raleigh population. Unc-115b was found in a previous study to be the most highly upregulated gene in a D. melanogaster strain resistant to the insecticide DDT 91-R compared to a DDT compromised strain, 91-C [24]. Unc-11b was one of two genes found in this study to be highly upregulated across all stages of development that were assayed [24]. This gene was found to be in one of six selective sweeps that coincided with constitutive expression differences between DDT resistant and compromised lines.
Variation in heat shock proteins
Modifying maternal RNAs and proteins in the embryo can have effects on development, phenotypes, and ultimately fitness [25, 26]. One gene family that is critical to survival is heat shock proteins [27, 28]. In total, 17 and 19 heat shock proteins were found to be differentially deposited within the Raleigh population and Zambia population, respectively. This is in contrast to after zygotic genome activation, where 6 and 8 zygotic-only heat shock proteins were found to be differentially expressed within the two populations. Previous work by Lockwood et al. has shown evidence that higher levels of maternal deposition of a heat shock protein increases embryo thermal tolerance in D. melanogaster [29]. Hsp23 was found to be differentially deposited in the lines that we examined (Fig. 5A, bottom panel). Specifically, the levels of Hsp23 mRNA in ZI094 is between 4-14X higher than the other three Zambia lines and 11-600X higher levels than the Raleigh lines, all which have variable expression. This overall trend persists at stage 5, with mean levels of Hsp23 increasing in ZI094 and maintaining higher expression levels compared to all other lines. Based on this observation, we performed heat shock experiments on all of the lines to assay differences in embryo survival after heat stress (see Methods). We did not find a significant relationship between line and survival (ANOVA, p < 0.05) at 24 and 36 degrees, however there was a small significance (p = 0.014) at 38 degrees. However, we found that heat shock tolerance does not correspond in a predictive way with levels of heat shock transcripts (Fig. 5B).
A number of the most differentially expressed genes are annotated as pseudogenes
The most differentially maternally deposited gene between the Zambia and Raleigh populations in our analysis is the gene CR40354 which is annotated in the D. melanogaster genome as a pseudogene with unknown function. This prompted us to investigate other genes annotated as pseudogenes in our dataset because previous annotations that identified these genes as pseudogenes were more likely to have been done in non-African populations. We asked how many pseudogenes were maternally deposited and zygotically expressed within and between populations. A total of 69 and 70 genes labeled as pseudogenes were found to be maternally deposited within the Raleigh and Zambia populations, respectively (Fig. 6A). A total of 16 and 8 genes labeled as pseudogenes were found to be expressed from the zygotic genome but not the maternal genome (zygotic-only, see Methods) in the Raleigh and Zambia populations. Between the populations, 18 pseudogenes were found to be differentially maternally deposited and 16 of the zygotic-only pseudogenes were found to be differentially expressed at stage 5 (Fig. 6B). One pseudogene which caught our attention was the swallow Ψ (swaΨ) pseudogene which is differentially expressed within the Zambia population in our analysis. swaΨ is a result of a recent genome duplication of swallow, and is only found in D. melanogaster [30]. swallow is a critical gene to early development, and is required for proper Bicoid positioning in the embryo [31]. Previous studies [32] have suggested that swaΨ not transcribed in D. melanogaster. We found it to be very lowly expressed in the Raleigh lines, but variably expressed within the Zambia lines with one line, ZI160, showing relatively high expression levels (Fig. 6D). To investigate further, we sequenced the swaΨ locus in each of the lines. We discovered a 15 bp population-specific deletion present (Fig. 6C). All Raleigh lines have a 15 bp deletion in the annotated exon 3 of swaΨ, which is not present in all four Zambian lines. This sequence is part of the fully functional exon 3 of the swallow gene.