Genome-wide identification of DAE QTLs in forebrain and kidney of F1 mice by RNA-Seq
Reciprocal crosses were made between C57BL/6J (B6) and three other inbred strains: 129S1/SvlmJ (129S), DBA/2J (DBA), and CAST/EiJ (CAST). PolyA+ mRNA was prepared from forebrain and kidney of F1 mice generated from these crosses and mRNA was pooled from six individual F1 mice for each transcriptome library. In the twelve sequencing analyses between 17.8 and 45.4 million sequence fragments, corresponding to 0.64–1.75 Gb, were mapped (Additional file 1: Table S1). Mapped reads were analyzed for sequence variation in order to determine whether either parental allele was preferentially expressed. When the B6 reference sequence (NCBI37/mm9) was used to map F1 RNA-Seq reads, allelic mapping bias was detected, evidenced by the appearance of an excess of B6 allele specific reads in both sides of any reciprocal cross. We also observed the existence of reference bias by mapping sequence reads to a reference in which other strain specific variants were introduced. Reference bias was eliminated by mapping data to both parental reference genomes and averaging results; the equivalent of mapping to a haplotypically accurate heterozygous reference sequence (Additional file 2: Figure S1). Strain-specific alleles were validated through exome sequencing of the parental mouse strains and by comparison to the Sanger mouse genome resource (The VCF file 20111102-snps-all.annotated.vcf.gz was downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1105) to ensure that only genuine SNVs were interrogated (Additional file 1: Table S2).
DAE was analyzed in forebrain and kidney for the three reciprocal crosses. Only SNVs with read counts ≥8 were included, this being the threshold at which DAE was consistently detected for known maternal and paternally imprinted genes (Additional file 1: Table S3). Differentially allelic expressed transcripts were observed in both tissues for all three reciprocal crosses (Additional file 2: Figure S2). In each instance there was a similar ratio of over-represented strain-specific alleles in all analyses, indicating that reference bias was successfully eliminated. The number of genes assayed for DAE varied depending upon the number of strain-specific SNVs detectable and the tissue. Using these criteria, approximately 3,000–5000 genes were assayable for DAE in forebrain and 2,500 – 4,500 genes in kidney. Using a FDR cutoff of p ≤ 0.05 and requiring the same direction of expression in each F1 cross, 3.2 – 9.7 % of assayed genes in forebrain showed significant allelic imbalance in expression. Although kidney had fewer assayable genes for DAE analysis than forebrain, it had higher proportion of genes with evidence of DAE QTLs (7.6–19.5 %) (Additional file 2: Figure S2).
Identification of DAE QTLs is enhanced by observation of consistency of DAE in reciprocal F1 crosses. When DAE effects are not consistent in reciprocal crosses, this can identify genes imprinted according to parental origin. Using DAE patterns we were able to identify imprinted genes and distinguish these from strain-specific DAE QTLs. The numbers of genes exhibiting DAE varied between the different F1 crosses, and between tissues (Fig. 1): 5.4 % (200/3,722), 1.9 % (51/2,669) and 1.5 % (61/4,171) of expressed genes showed DAE in forebrain in B6/129SF1(B6129SF1 and 129SB6F1), B6/DBAF1(B6DBAF1 and DBAB6F1), and B6/CASTF1(B6CASTF1 and CASTB6F1) reciprocal crosses respectively, and for 11.5 % (322/2,794), 9.1 % (203/2,236) and 5.0 % (165/3,320) of expressed genes in kidney in the B6/129SF1, B6/DBAF1, and B6/CASTF1 reciprocal crosses respectively.
The overlap in genes showing significant DAE in both reciprocal crosses was greater than expected by chance (Additional file 1: Table S4). Surprisingly, after FDR correction the number of DAE QTLs was smallest in B6/CAST F1 mice, despite the fact that there are a larger number of informative SNVs in this cross than for the other crosses. This observation is possibly explained in a comparison of the forebrain data from B6/CASTF1 with B6/129SF1 which showed that most of the potentially informative SNVs for the B6/CAST F1 mice were located in transcripts with low sequence read number even though the B6/CAST F1 mice had higher mapped read counts than the other crosses (Additional file 1: Table S1). Analysis of DAE using a preselected set of transcribed regions (or “genes”) could potentially implicate additional B6/CAST loci as having DAE QTLs. However, our pipeline was designed to test DAE in transcripts regardless of whether the transcript was associated with an annotated gene, and in keeping with the potential, and at times already demonstrated, functional significance of some non-coding RNAs.
Validation for strain-specific and parental allele specific DAE-QTLs in the forebrain
We tested the ability of our RNA-Seq based pipeline to reliably detect DAE QTLs in multiple ways. Because analyses were performed on male F1 mice, only the maternal allele should be detectable in transcripts derived from the X chromosome, and this was confirmed for almost all the DAE assayable X-linked genes in all the F1 crosses (Additional file 2: Figure S3). Next, DAE results for 20 randomly selected genes with SNVs showing significant DAE in forebrain (FDR p-value ≤ 0.05) and 15 genes with SNVs showing no DAE, were validated in the individual F1 mice originally pooled for RNA-Seq by quantitative Sanger sequencing using cDNA. Sequence from genomic DNA of the same F1 mice was used to control and correct for any bias in allelic quantitation. Findings from the quantitative Sanger sequencing showed excellent correlation with RNA-Seq allelic expression imbalance (or the lack thereof) for all 35 genes analyzed (Additional file 2: Figure S4B). The validity of RNA-Seq to identify genuine DAE QTLs was further demonstrated by the detection of multiple previously identified imprinted genes in forebrain [13, 20–23]. These included 16 imprinted genes (Zrsr1, Nap1l5, Calm1, Snrpn, Peg3, Ndn, Dlk1, H13, Zdbf2, Sgce, Peg10, Usp29, Inpp5f, Rasgrf1, Copg2 and Impact) and 7 imprinted non-coding RNAs (Apeg3, Rian, Snord64, Meg3, D7Ertd715e, Peg13 and C230091D08Rik). DAE of several of these imprinted genes and non-coding RNAs was confirmed by quantitative Sanger sequencing (Additional file 2: Figure S5). We uncovered at least one instance of a gene, Zim3, that has been reported as maternally imprinted in mouse testis [20] but which shows strong DAE in forebrain where only the 129S- derived transcript is expressed in either reciprocal F1. Further, in the parental strains, 129S mice express Zim3 at levels 6-fold higher than B6 mice (Additional file 2: Figure S8). The DAE in reciprocal F1 mice and the differing levels of Zim3 expression in the parental strain together suggest that, at least in forebrain, Zim3 expression is not regulated by imprinting, but by cis-acting functional elements that vary between B6 and 129S mice. Direct Sanger sequencing of Zim3 cDNA from the F1 animals also indicated the presence of two alternatively spliced transcripts, however individual F1 mice only expressed one of these two isoforms, the expressed transcript always being derived from the 129S allele suggesting a complex regulation of this locus.
For comparison to the genetic approach for detecting eQTLs of measuring RNA levels in RI strains and to further confirm DAE QTLs, we measured transcript levels in parental strains for 10 genes that exhibited DAE QTLs in the F1 mice. Expression levels were normalized to the expression in B6 mice and the ratios of expression levels compared to the RNA-Seq and quantitative Sanger sequencing allelic imbalance ratios observed in the F1 mice. There was good correlation between the relative parental expression levels and the ratio of allelic expression observed in the F1 mice. These multi-level results, including consistent allelic imbalance in the two reciprocal F1 strains identified by RNA-Seq, replication of the allelic ratios by quantitative Sanger sequencing, and the coherent differential gene expression in the parental strains not only confirm strain-specific DAE QTLs, but also demonstrate the value of using RNA-Seq for genome-wide identification of cis-eQTLs altering gene transcript levels (Additional file 2: Figure S4C).
Tissue-specific and shared DAE QTLs between forebrain and kidney
The ratios of genes in which either the B6 allele or alternative allele was preferentially expressed was similar in all three reciprocal crosses, for both forebrain and kidney. DAE QTLs can be identified based on consistency of DAE in the reciprocal crosses, by extent of differential expression, and by level of statistical significance, or by a combination of these parameters. We have in effect done this by pruning the genes analyzed for DAE by level of representation in the sequencing, prior to applying statistical testing, and by use of reciprocal crosses.
Tissue-specific DAE was common, but in line with previous lower estimates. Only a small proportion of genes showed conservation of DAE pattern between forebrain and kidney even though the same combinations of DNA control elements will be present to regulate gene expression across different forebrain and kidney samples. Tissue-specific cellular function is reflected in part by different patterns of gene expression between tissue types. Analysis of expressed genes with SNVs detected in both forebrain and kidney in both reciprocal crossed F1 mice identified 2,173 genes in B6/129SF1, 1,511 genes in B6/DBAF1, and 1,951 genes in B6/CASTF1. Within this subset of genes 11.9 %, 8.5 %, and 5.0 % showed forebrain DAE QTLs in B6/129S F1, B6/DBA F1 and B6/CAST F1, respectively, but only a small proportion of DAE genes showed DAE in both forebrain and kidney: 2.4 % in B6/129SF1, 0.7 % in B6/DBAF1, and 0.3 % in B6/CASTF1 (Fig. 2a). Also, we observed that for the subset of genes expressed in both kidney and forebrain that more genes showed DAE in kidney than in forebrain (Fig. 2b). Moreover, in forebrain 3 genes, Acadm, Aldh7a1, and Dci, showing preferential BL6 allelic expression showed 129S-allelic preference in kidney in both F1 crosses, suggesting that cis-acting loci at these genes were regulated by tissue specific trans-factors. These tissue-specific DAE findings are unlikely to be significantly confounded due to variation in detection power, which is affected by both tissue-specific expression level and sequence coverage, because our comparison only included genes detected at ≥ 8 reads in both tissues, and which showed similar allelic preference for a tissue in both reciprocal crosses. Additionally no bias was observed towards tissue where more sequencing read counts are obtained as would be expected if expression level was influencing the results. These data show that the majority of co-expressed genes in both kidney and forebrain were expressed with a preference for a specific allele in only a single tissue type illustrating the importance of tissue-specific factors in transcriptional regulation, and RNA processing and stability.
Important genes in neural functions and behavior show strain-specific differential allelic patterns in the forebrain
Several genes known to play critical roles in neural functions and behavior exhibited DAE QTLs in forebrain of B6/129SF1 mice. Gabra2 encodes the gamma-aminobutyric acid (GABA) receptor α-2 subunit. GABA receptors are ligand-gated chloride ion channels that mediate GABA inhibitory neural transmission and that have been shown to influence alcohol and other addictive substance use. A negative correlation between oral morphine consumption and Gabra2 expression was observed in inbred mouse strains [24, 25]. In B6/129S F1 mice, the 129S allele is expressed at levels 4-fold higher than the B6 allele in both reciprocal crosses (Fig. 3). This finding is consistent with the previously observed behavioral differences between the strains, and with the observation that Gabra2 expression differs between C57BL/6J and 129S1/SvlmJ strains. This could be a trans-regulatory effect, but our data suggests that the observed strain-specific differences in Gabra2 expression arise due to the presence of cis-acting variants at or near the Gabra2 gene.
Variation at the Gas5 gene (growth arrest specific 5) has been associated with levels of aggressive behavior in mice. This non-coding RNA was shown to be expressed at levels 8-fold higher in the brains of highly aggressive mouse strains than in less aggressive strains [26]. B6 mice have been reported to exhibit higher levels of aggressive behavior than 129S mice [27]. Our data showed both higher levels of Gas5 expression in forebrain of B6 compared to 129S mice, and that the B6 allele is preferentially expressed in the forebrain of both reciprocal B6/129S F1 crosses (Fig. 3). This suggests that a cis-element in B6 allele promotes stronger transcription of the Gas5 gene. However, an alternative explanation is that a deletion (chr1:162,966,110) in the Gas5 gene of 129S mice may lower the stability of the 129S-derived transcript, which calls attention to the need to consider non-transcriptional mechanisms for eQTLs.
Genetic association with phenotype (QTLs) in crosses between inbred strains of mice and DAE QTLs in F1 mice as evidences of QTLs
Genomic loci modulating several behavioral phenotypes have been mapped using genetic markers in crosses of inbred mouse strains and in recombinant inbred (RI) lines such as the BXD lines and more recently the Diversity Outbred (DO) crosses. Mapping these behavioral quantitative trait loci (QTLs) has identified genomic regions containing multiple candidate genes for a variety of behaviors, including ethanol-related phenotypes [28, 29], cocaine-related phenotypes [30], and seizure susceptibility [31]. A survey of the Mouse Genome Informatics (MGI) database identified 131 QTLs covering 45 different phenotypes (response to alcohol, morphine, cocaine, or methamphetamine, and other neurobehavioral phenotypes including seizures, sleep, and circadian rhythm). The QTL findings distinguishing B6 mice and DBA mice implicate some 415 candidate genes in crosses involving the B6 and DBA strains (Additional file 1: Table S5). Of these 415 candidate genes 91 were assayable for DAE in forebrain, using our stringent criteria, and 61 were assayable in kidney. The remainder either lacked the prerequisite genetic variation for assay, or was not expressed at a sufficient level. Four of the 91 candidate forebrain genes were DAE QTLs in the B6/DBA F1 mouse forebrain: Darc (duffy antigen receptor for chemokines), Kifap3 (kinesin-associated protein 3), Atp1b1 (sodium/potassium-transporting ATPase subunit beta-1 and Ndn (necdin), indicative of the presence of cis-acting regulatory alleles different the parental strains (Additional file 1: Table S6). Atp1b1 showed elevated levels of the DBA derived transcript in both forebrain and kidney (Additional file 2: Figure S6). The fraction of QTL-implicated candidate genes exhibiting DAE was higher than would be expected by chance. In brain, 4.4 % (4/91) of the candidate genes were DAE QTLs as compared to 1.9 % (51/2669) overall (p = 0.0008). Similarly, in kidney, 21.3 % (13/61) of the QTL-implicated candidate gene genes were DAE QTLs versus 9.1 % (203/2236) overall (p = 0.019).
Several of the genes that we find differentially expressed in mouse forebrain have been previously implicated in alcohol preference via alcohol QTLs and shown to differ in expression in B6 mice as compared to other strains via conventional transcript measures. DAE in B6/DBAF1 mice confirms previous findings that Darc, Kifap3, Aldh9a1, Sc5d, Sdhc and Sorl1 are transcribed at different levels in B6 and DBA brains between B6 and DBA mice [28, 29], a difference arising due to the presence of cis-acting loci at these genes (Additional file 2: Figure S6). 129S and DBA mice show lower alcohol preference compared to B6 mice [32–34], and we observe that Darc, Sdhc, and Sc5d, all candidate genes at alcohol related QTLs, have DAE QTLs in the forebrain and kidney of both B6/DBA and B6/129S F1 mice.
Hdc (histidine decarboxylase), a candidate gene within the Pbrgcsf1 QTL (peripheral blood stem cell response to granulocyte colony stimulating factor 1) [35], has been shown to differ in expression between inbred mouse strains. Hdc protein levels and enzymatic activities of the DBA/2J mouse are approximately 10-fold higher than in C57BL/6J but only in the kidney [36]. Consistent with these findings, we observed that the DBA allele of Hdc is almost exclusively expressed in kidney of B6/DBAF1 mice (Additional file 2: Figure S6) suggesting the existence of a DBA specific, cis-acting regulatory sequence variant.
However, most assayable candidate genes within the identified QTLs for complex phenotypes did not show DAE QTLs in B6/DBAF1 mice (Additional file 1: Table S7) even though several of these genes previously were flagged as showing higher or lower transcript levels in C57BL/6J mice compared to DBA/2J mice [28]. This may suggest a difference in methodological precision of DAE versus measures of transcript level, or more intriguingly that the differences in transcript levels were caused by trans-acting factors, and even though the genes were located within behavioral QTLs. The fact that a gene exhibiting altered transcript levels is localized to a QTL does not necessarily mean that the sequence variant altering level of the transcript is local to the gene. For example, Syn3 was identified as a candidate gene associated with a reversal-learning phenotype in the BXD RI strains based on its being one of five genes under a behavioral QTL peak, and because its level of expression correlated with the behavioral phenotype in BxD strains [37]. However, our data reveal no DAE for Syn3 in the B6/DBA F1 mice (Additional file 2: Figure S7), indicating that at least for this candidate gene the difference in RNA expression which correlates with the phenotype may not be due to a cis-acting regulatory variant, but may arise through a trans mechanism or may have been an artifact chance finding. Most likely, many of the gene expression differences observed between strains and implicating genes that map to QTLs are not due to the actions of cis-acting loci at these genes. We note, however, that it is possible that an action of a cis-locus to drive DAE might be observed in other brain regions. DAE based on RNA-Seq of one brain region, or even several, cannot rule out the existence of cis-acting loci operative in some region of the brain or a particular cell. However, where a DAE QTL is observed, it strongly points to the existence of such a cis-locus. Although cis-effects will not be observed for most genes within QTL intervals, using RNA-Seq in F1 animals we were able to observe cis-effects for candidate genes related to alcohol, seizures, angiogenesis, and peripheral blood stem cell response (Additional file 1: Table S6).
Cis-eQTLs in RI and F2 databases vs DAE QTLs in F1 mice
Multiple expression quantitative trait loci (eQTLs) have been identified by correlating genotype data with gene expression in F2 crosses and RI strains. Using RNA-Seq DAE we estimated the number of genes with eQTLs in BXD RI strains that represent genuine cis-eQTLs using the BXD eQTLs listed in GeneNetwork (www.genenetwork.org). In GeneNetwork the BXD brain cis-eQTLs are based on data from 29 BXD RI strains, collected from three RNA-Seq datasets. The kidney cis-eQTLs in this database are derived from 54 BXD RI lines collected from three RNA micro array datasets. Out of the 26,378 autosomal genes listed in GeneNetwork, 17,698 and 16,222 were found as eQTLs in whole brain and kidney, respectively (Fig. 4), and of which 1,959 in whole brain and 2,347 in kidney are listed as cis-eQTLs. Only 1,464 genes in brain (74.7 %) and 1,933 in kidney (82.4 %) were potentially assayable by DAE, and of these only 653 genes in brain and 870 genes in kidney were available to observe in our B6/DBAF1 RNA-Seq. 45 of 51 DAE QTLs in forebrain and 167 of 203 DAE QTLs in kidney were listed as eQTLs. 17/653 assayable whole brain cis-eQTLs and 117/870 assayable kidney cis-eQTLs were confirmed by DAE QTLs. The rate of confirmation was above random expectations (brain: p = 0.0003, kidney: p = 0.001), the randomly expected overlap being only 5.3 and 16.2 genes in brain and kidney respectively, as compared to 17 and 117 genes observed. However, 97 % (636/653) of assayable forebrain cis-eQTLs and 87 % (753/870) of assayable kidney cis-eQTLs could not be confirmed by DAE. Overall, these data suggest that cis-eQTL mapping, using a restricted number of RI strains in which RNA levels are quantitated, has a high false-positive rate. Additionally, 28 genes in forebrain and 50 genes in kidney showing DAE QTLs in B6/DBAF1 mice had not been reported as cis-eQTLs in BxD RI strains, and 6 imprinted genes, C230091D08Rik, D7Ertd715e, Peg10, Peg13, Ragrf1, and Zdbf2 were identified only as eQTLs but not as cis-eQTLs, which suggests that cis-eQTL mapping in restricted numbers of RI strains also has a lower sensitivity to detect cis-acting regulatory elements than DAE performed in a single reciprocal F1 cross. The discrepancy between reported cis-eQTLs and our DAE cis-eQTLs could arise from several confounding factors including lack of LD between cSNV reporters and regulatory loci in DAE analysis, inadequate sequencing in DAE coverage leading to failure to detect more subtle allelic differences, and as is likely important, false positive and false negative cis-eQTLs from studies of small numbers of RI strains.