Systematic investigation of imprinted gene expression and enrichment in the mouse brain explored at single-cell resolution

Background Although a number of imprinted genes are known to be highly expressed in the brain, and in certain brain regions in particular, whether they are truly over-represented in the brain has never been formally tested. Using thirteen single-cell RNA sequencing datasets we systematically investigated imprinted gene over-representation at the organ, brain region, and cell-specific levels. Results We established that imprinted genes are indeed over-represented in the adult brain, and in neurons particularly compared to other brain cell-types. We then examined brain-wide datasets to test enrichment within distinct brain regions and neuron subpopulations and demonstrated over-representation of imprinted genes in the hypothalamus, ventral midbrain, pons and medulla. Finally, using datasets focusing on these regions of enrichment, we identified hypothalamic neuroendocrine populations and the monoaminergic hindbrain neurons as specific hotspots of imprinted gene expression. Conclusions These analyses provide the first robust assessment of the neural systems on which imprinted genes converge. Moreover, the unbiased approach, with each analysis informed by the findings of the previous level, permits highly informed inferences about the functions on which imprinted gene expression converges. Our findings indicate the neuronal regulation of motivated behaviours such as feeding and sleep, alongside the regulation of pituitary function, as functional hotspots for imprinting. This adds statistical rigour to prior assumptions and provides testable predictions for novel neural and behavioural phenotypes associated with specific genes and imprinted gene networks. In turn, this work sheds further light on the potential evolutionary drivers of genomic imprinting in the brain. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08986-8.


Background
Imprinted genes demonstrate a preferential or exclusively monoallelic expression from either the maternal or paternal allele in an epigenetically predetermined manner (a parent-of-origin effect, POE). To date approximately 260 imprinted genes, demonstrating biased allelic expression and/or associated with a parental-specific epigenetic mark, have been identified in the mouse (~ 230 in humans) [1,2]. This epigenetic regulation makes genomic imprinting an evolutionary puzzle as many of these genes are effectively haploid and thereby negate many of the benefits of diploidy [3]. Studying the patterns of expression and function of imprinted genes may shed light on the drivers leading to the evolution of genomic imprinting. For instance, characterisation of a number of imprinted genes points to convergence on placental function [4], in line with the predictions of early theoretical ideas [5]. Outside of the placenta, the brain consistently emerges as an adult tissue with a large number of expressed imprinted genes [6][7][8]. However, given that it is estimated that ~ 80% of all genes in the genome are expressed in the brain [9,10], the question remains, is imprinted gene expression actually enriched in the brain compared to other adult tissues? To date this has never been formally tested.
A role for imprinted genes in the brain was initially suggested by [11], and neurological phenotypes observed in, early imprinted gene mouse models [12]. In addition, behavioural deficits were seen in imprinting disorders such as Prader-Willi and Angelman syndromes [13,14]. Subsequent studies have revealed diverse roles for imprinted genes in the brain. During development, several imprinted genes are involved in the processes of neural differentiation, migration, axonal outgrowth and apoptosis [15]. In the adult brain, studies of mice carrying manipulations of individual imprinted genes have suggested a wide range of behavioural roles including maternal care [16], feeding [17], social behaviour [18,19], learning/memory [20], cognition [21,22], and more recently, sleep and circadian activity [23].
In addition to studies on individual imprinted genes, there are a limited number of studies that take a systems level approach to characterizing the role of genomic imprinting in the brain. Early studies examining developing and adult chimeras of normal and parthenogenetic/ gynogenetic (Pg/Gg-two maternal genomes) or androgenetic (Ag-two paternal genomes) cells indicated distinct regional distribution for maternally (cortex and hippocampus) and paternally (hypothalamus) expressed genes [12,24]. More recently, Gregg, Zhang [8] used the known imprinting status of 45 imprinted genes and the Allen Brain Atlas to track dichotomous expression of imprinted genes across 118 brain regions to identify brain-wide patterns of expression. Most imprinted genes were expressed in every brain region, but detectable expression of the largest number of imprinted genes was found in regions of the hypothalamus (medial preoptic area, arcuate nucleus), central amygdala, basal nuclei of the stria terminalis and the monoaminergic nuclei, suggesting some form of specialisation. Although pioneering, this study, and others identifying novel imprinted genes and/or mapping allelic expression in the brain [6,7,25,26], did not test whether the expression of these genes was especially enriched in given brain regions but simply asked if they were expressed, at any level, or not.
Here we address the question of whether the brain and/ or specific brain circuitry is a foci for genomic imprinting by exploiting the rapidly expanding number of single-cell RNA sequencing (scRNA-seq) datasets and systematically investigating imprinted gene enrichment and over-representation in the murine brain. We performed this by a hierarchical sequence of data analysis, using datasets that allowed a multi-organ (Level 1) comparison first, before proceeding to brain-specific (Level 2) and brain region-specific (Level 3) comparisons with the outcome of each level informing the data selection for the next one, to identify a consistent pattern of enrichment (Fig. 1). We sought to provide a robust assessment of the neural systems on which imprinted genes converge, statistically validating previous assumptions, identifying neuronal domains that have received less emphasis in earlier studies, and providing testable predictions for novel neural and behavioural phenotypes associated with specific genes and imprinted gene networks.

Imprinted gene expression is enriched in the brain in a multi-organ analysis (Level 1 analysis)
The Mouse Cell Atlas (MCA) [27] and the Tabula Muris (TM) [28] are single cell compendiums containing ~ 20 overlapping, but not identical, adult mouse organs. Key overlapping organs include the bladder, brain, kidney, lung, limb muscle, and pancreas while organs included in only one dataset include the ovary, testes, uterus, stomach within the MCA, and the heart, fat, skin, trachea and diaphragm within the TM. These compendiums create a snapshot of gene expression across adult tissues to assess imprinted gene enrichment. Since this study focused on the adult body and brain, fetal tissues (including the placenta) were not assessed.
An over-representation analysis (ORA) was performed on both datasets. All data were processed according to the original published procedure, a list of upregulated genes was produced for each tissue/identity group (vs. all other tissue/identity groups) and a one-sided Fisher's Exact test was performed using a custom list of imprinted genes (Supplemental Table S1) to identify tissues in which imprinted genes were over-represented amongst the upregulated genes for that tissue. Each dataset in this study was analysed independently which allowed us to look for convergent patterns of enrichment between datasets of similar tissues/cell-types. Across only adult tissues, imprinted genes were convergently over-represented in the pancreas, bladder and the brain in both datasets ( Fig. 2A). In addition, in the MCA adult tissue dataset, there was a significant over-representation in the uterus (Table 1), and in the Tabula Muris analysis (Table 2), there was a significant over-representation in the muscle-based tissues-diaphragm, trachea, and limb muscles. In addition to the ORA, to identify situations in which imprinted genes were in fact enriched amongst the stronger markers of a tissue/cell-type, we performed a Gene-Set Enrichment Analysis (GSEA) on tissues meeting minimum criteria (see Methods), which assessed whether imprinted genes were enriched within the top ranked upregulated genes for that tissue (ranked by Log2 Fold Change). No tissue at this level showed a significant GSEA for imprinted genes. Mean normalised expression of imprinted genes across identity groups (Supplemental Table S2) was the highest for Brain in the MCA and highest for Pancreas in the TM (Brain (Non-Myeloid) was the fourth highest).

Imprinted gene expression is enriched in neurons and neuroendocrine cells of the brain (Level 2 analysis)
We next analysed cells from the whole mouse brain (Level 2), firstly using the Ximerakis, Lipnick [29] dataset, in which cells were grouped from the whole mouse brain (minus the hindbrain) into major cell classes according to cell lineage. Imprinted genes were over-represented in neuroendocrine cells and mature neurons (Table 3).
Neuroendocrine cells were defined as a heterogeneous cluster, containing peptidergic neurons and neurosecretory cells expressing neuronal marker genes (e.g., Syt1 and Snap25) alongside neuropeptide genes (e.g., Oxt, Avp, Gal, Agrp and Sst) but distinguished by Ximerakis, Lipnick [29] by the unique expression of Baiap3 which plays an important role in the regulation of exocytosis in neuroendocrine cells [30]. GSEA additionally showed that the imprinted genes were enriched in the genes with the highest fold change values for neuroendocrine cells only (Fig. 3). 26 imprinted genes had their highest expression in the neuroendocrine cells and the mean normalised expression of imprinted genes was almost twice as  Fig. 1 The hierarchical set of datasets in this analysis. The datasets are sorted into Level 1 (Multi-Organ), Level 2 (Whole Brain) and Level 3 (Specific Brain Nuclei) analyses. The original publication and specific tissue/s analysed are provided for each analysis. White text in dark grey box indicates specifics to the analysis at that level -whether the analysis used the 'marker gene' Log2FC criteria or the relaxed Log2FC > 0 criterion, whether paternally and maternally expressed gene (PEG/MEG) analysis was carried out and whether the number of IGs with highest expression in a cell population and the average normalised expression were reported for imprinted genes high for neuroendocrine cells as the next highest identity group (Supplemental Table S2). The MEG/PEG analysis (Supplemental Table S5A and S5B) for this dataset found that PEGs were over-represented in mature neurons (q = 0.027) and neuroendocrine cells (q = 8.97 × 10 -6 ). MEGs were also over-represented in neuroendocrine cells (q = 0.047) and uniquely over-represented in Arachnoid barrier cells (q = 0.014). Only PEGs replicated the significant GSEA in neuroendocrine cells (p = 4 × 10 -4 , Supplemental Fig. S2). The second dataset at this level was Zeisel, Hochgerner [31] Mouse Brain Atlas (MBA) which allowed a much deeper investigation of nervous system enrichment with sequencing of the entire murine nervous system and identifying cells by both brain region and cell type.
Concordant with the previous findings, primary analysis separating cells by lineage revealed over-representation of imprinted genes in neurons only ( Table 4). The overlap between the upregulated imprinted genes for the overrepresented neural-lineage cells from the Level 2 datasets are displayed in Fig. 4. Additionally, PEGs alone demonstrated no significant over-representations in cell lineage types while MEGs demonstrated over-representation in vascular cells only (q = 0.0004) (Supplemental Table S6A and S6B). Although these tissues are not identical, they were the two brain associated over-representations in the enrichment analysis. Parental-bias is indicated by colour (MEG-red, PEG-blue). From the 119 imprinted genes in the gene list, only 92 were common to both analyses (i.e., successfully sequenced and passed gene quality control filters). 34 imprinted genes were upregulated in the brain in the MCA and 31 genes in the TM. Genes in common from the two analyses are presented in bold and totalled in each section of the Venn Diagram, while genes found upregulated in one analysis but not available in the other analysis are included in small font and the number indicated in brackets. B Tissues with over-representation in MCA, and C tissues with over-representation in Tabula Muris. For both coloured bold labels were over-represented tissues using all imprinted genes; tissues with a blue circle behind were over-represented for PEGs alone; a red circle represented over-representation for MEGs along; and a red/blue split circle were over-represented for both PEGs and MEGs analysis was performed on cells classified as neurons and then grouped by brain/nervous system regions. Significant over-representation was seen in neurons of the hypothalamus, ventral midbrain, medulla, and pons ( Table 5). The pons and medulla had the largest number, 45 and 44 respectively, of imprinted genes upregulated (Fig. 5A). Regional analysis for MEGs and PEGs separately (Supplemental Table S7A and S7B), revealed that PEGs were over-represented in hypothalamus (q = 6.53 × 10 -7 ), ventral midbrain (q = 0.018), the pons (q = 4.65 × 10 -5 ) and the medulla (q = 4.10 × 10 -6 ); while MEGs were only over-represented in the medulla (q = 0.002) but had a significant GSEA for the pons (q = 0.027, Supplemental Neurons were then recategorized into unique subpopulations identified by marker genes [31] to uncover the specific neural populations underlying the enrichment seen in the hypothalamus, pons and medulla, and midbrain ( Fig. 6; Supplemental Table S8). Each neural population was identified by its distinct gene expression and suspected location within the brain (see http:// mouse brain. org/ for an online resource with detailed information on each cluster).
The midbrain, medulla and pons were represented by a number of cell groups, with over-representation seen in the medulla-based adrenergic (HBAR) and noradrenergic (HBNOR) groups and the dopaminergic neurons of the midbrain in the Periaqueductal Gray (PAG) (MBDOP1) and the Ventral Tegmental Area (VTA)/Substantia Nigra (SNc) (MBDOP2). There were also several inhibitory (MEINH, HBIN) and excitatory neuron (MEGLU, Table 1 Imprinted gene over-representation in MCA adult tissues [27] Tissue identities for the cells used in analysis, Up Reg number of upregulated genes with q ≤ 0.05 and Log2FC ≥ 1 (total number of genes in the dataset in brackets), IG number of imprinted genes upregulated with q ≤ 0.05 and Log2FC ≥ 1 (total number of IGs in the dataset in brackets), ORA pp value from over representation analysis on groups with minimum 5% of total IGs, ORA q Bonferroni corrected p value from ORA, Mean FC IG mean fold change for upregulated imprinted genes, Mean  HBGLU) types spread across the nuclei from the three regions (Fig. 6). The serotonergic populations of the raphe nuclei of these regions (HBSER) were particularly prominent since the pons and medulla-based serotonin neuron populations (HBSER2, HBSER4 and HBSER5) were the only neuron subpopulations out of the 214 total to have a significant GSEA for imprinted genes after correction (Supplemental Fig. S4). Additional regions of over-representation included neurons in the pallidum and striatum and PVN neurons from the thalamus. In this comparison of 214 neuron populations, no neurons from areas such as the cortex, cerebellum or peripheral nervous system were enriched, and neither were they over-represented in the previous regional analysis. Hence, further analysis focused on those brain regions enriched in this whole brain level analysis.

Imprinted gene expression is over-represented in specific hypothalamic neuron subtypes (Level 3A&3B analysis)
We next sought to investigate whether those regional neuron enrichments found within the whole brain comparisons would be further clarified with enriched expression in specific neuronal subpopulations within those regions. Namely we sought to identify neural populations enriched across the whole hypothalamus and those enriched within specific hypothalamic nuclei, and also whether imprinted gene expression was enriched in the other key subpopulation identified, the ventral midbrain and hindbrain dopaminergic and serotonergic populations. Two datasets with single cell sequencing data for the adult hypothalamus existed [32,33]. Both clustered their data into neuronal subpopulations allowing us to look for convergent imprinted enrichment across major hypothalamic neuronal subtypes (Level 3A). Analysis revealed a clear neuronal bias in expression of imprinted genes (Supplemental Table S9A and S10A). Within the Romanov, Zeisel [32] data, there was a significant overrepresentation of imprinted genes in neurons (q = 0.02) and a similar observation was seen in the Chen, Wu [33] data (q = 0.001), and both also demonstrated a significant GSEA in neurons ( Within the Chen, Wu [33] dataset, 4/33 hypothalamic neuronal subtypes had a significant over-representation of imprinted genes (Supplemental Table S9B). The four subtypes were all GABAergic neurons, specifically: Galanin neurons (Slc18a2/Gal) present in a several hypothalamic regions (q = 0.0079); a dopaminergic neuron type (Slc6a3) with high expression of Th and Table 2 Imprinted gene over-representation in Tabula Muris adult tissues [28] All other column descriptions can be found in the legend of Table 1 GSEA pp value from Gene Set Enrichment Analysis for identity groups with 15 + IGs and Mean FC IG > Mean FC Rest, GSEA q -Bonferroni corrected p values from GSEA. Prlr suspected to be the TIDA neurons of the arcuate nucleus (q = 0.0001); SCN neurons (Vipr2) with very high Avp and Nms expression (q = 0.0071); and Agrp feeding promoting neurons of the Arcuate Nucleus (q = 0.034). Within the Romanov, Zeisel [32] dataset, 3/62 subtypes had significant over-representation of imprinted gene expression (Supplemental Table S10B): Agrp/Npy neurons (q = 0.013), the Arcuate Nucleus feeding neurons also reported in Chen, Wu [33]; a Ghrh/Th neuronal type (q = 0.032), again likely corresponding to neurons from the arcuate nucleus and the top hit (q = 1.63 × 10 -6 ) was a poorly segregated population (Calcr/Lhx1), likely due to a deeper inner cluster heterogeneity. This cluster was interesting since the imprinted genes Calcr and Asb4 were amongst its most significant marker genes, and it was notably the only cluster with high expression of all three of Th, Slc6a3 and Prlr. Romanov, Zeisel [32] did not identify any of their populations as the TIDA neurons, Table 3 Imprinted gene over-representation in neural lineage types [29] Cell lineage identities for the cells used in analysis. All other column descriptions can be found in the legend of Tables 1 and 2 Cell   Table 3). Imprinted genes were plotted in chromosomal order. Size of points represented absolute mean expression; colour represented the size of the Log2FC value for the cell identity group (e.g., neuroendocrine cells) vs. all other cells. Unique colour scales are used for MEGs (red/orange) and PEGs (blue). Where a gene was not expressed in a cell type, this appears as a blank space in the plot but the above pattern of gene expression suggests that this cluster may contain these neurons. Furthermore, the suspected TIDA neurons from the Chen, Wu [33] dataset shared 21/40 upregulated genes of this unresolved cluster (see Supplemental Table S11 for full comparison); Having consistently found well-known neurons from the arcuate nucleus (Agrp, Ghrh), and suprachiasmatic nucleus (Avp, Vip) we sought to test imprinted gene enrichment within these hypothalamic regions at a high resolution Table 4 Imprinted gene over-representation in nervous system cell types [31] Cell identities for the cells used in analysis. All other column descriptions can be found in the legend of  [29]. Imprinted genes listed which show significant upregulation (q ≤ 0.05 and Log2FC ≥ 1) in the cells. Although not identical, these were all mature neural lineage cells with over-representations in the enrichment analysis. Parental-bias is indicated by colour (MEG-red, PEG-blue). From the 119 imprinted genes in the gene list, only 88 were common to both analyses (i.e., successfully sequenced and passed gene quality control filters). 45 imprinted genes were upregulated in neurons in the MBA, and in Ximerakis, Lipnick [29], 33 imprinted genes were upregulated in neurons and 48 genes in neuroendocrine cells. Genes in common from the two analyses are presented in bold and totalled in each section of the Venn Diagram, while genes found upregulated in one analysis but not available in the other analysis are included in small font and the number indicated in brackets using datasets sequencing neurons purely from these hypothalamic regions (Level 3B).

Arcuate nucleus (ARC) [34]
The first nuclei investigated was the ARC sequenced by Campbell, Macosko [34]. Imprinted gene over-representation was found in 8/24 arcuate neuron types (Supplemental Table S12). These included the Agrp/Sst neuron type (with high expression of Npy, q = 0.003) and two Pomc neuron types (Pomc/Anxa2, q = 0.004; Pomc/ Glipr1, q = 0.03). Pomc expressing neurons are known to work as feeding suppressants [35]. Additional significant over-representation was found in the Ghrh neuron type (q = 0.009), which was also enriched in Gal and Th. Finally, a highly significant over-representation of imprinted genes was found in the Th/Slc6a3 neuron type (q = 1.72 × 10 -8 ) identified by the authors as one of the most likely candidates for the TIDA dopaminergic neuron population. Marker genes for this identity group overlapped with the TIDA candidates from the previous two datasets (e.g., Slc6a3, Th, Lhx1, Calcr). Agrp neurons, Ghrh neurons and these TIDA candidate neurons were identified in both whole hypothalamic datasets and at the nuclei level.

Suprachiasmatic Nucleus (SCN) [36]
Analysis of the 10 × chromium data of SCN neurons (Supplemental Table S13) revealed a significant overrepresentation (q = 1.51 × 10 -8 ) and GSEA (p = 0.004, Supplemental Fig. S5) in the Avp/Nms neuronal cluster (out of 5 neuronal clusters). This cluster shows the strongest expression for Oxt, Avp, Avpr1a and Prlr and is one of the three neural group that Wen, Ma [36] found had robust circadian gene expression, and the only subtype with notable phase differences in circadian gene expression in the dorsal SCN. This cluster likely corresponds to the GABA8 cluster found enriched in the Chen, Wu [33] dataset. Figure 8 presents the overlapping upregulated imprinted genes from the convergently upregulated neuron subtypes in the hypothalamic analysis of Level 3a and 3b.    Fig. 6 Anatomical labelling of all the neural subpopulations with a significant over-representation of imprinted genes (q ≤ 0.05 and Log2FC ≥ 1) in the Mouse Brain Atlas [31]. The predicted brain nuclei localisation of the 32 neuronal subpopulations (out of 214 populations identified across the nervous system) specified in the MBA and enriched for imprinted genes. Brain regions that were not found to be enriched for imprinted genes are greyed out. The full Enrichment Analysis is available in Supplemental Table S8

Imprinted gene expression is over-represented in monoaminergic nuclei of the mid-and hindbrain (Level 3C analysis)
In the MBA, Whole Hypothalamus and Arcuate Nucleus analyses, dopaminergic clusters were consistently enriched and, to explore this further, analysis of Hook, McClymont [37] data allowed comparison for dopamine neurons across the brain (specifically from the olfactory bulb, arcuate nucleus and midbrain) at two developmental timepoints (E15.5 and Post-natal day (P) 7). The arcuate nucleus P7 dopamine neurons emerged as the clearest over-represented subgroups (Supplemental Table S14). This included the Th/Slc6a3/Prlr neurons (q = 1.15 × 10 -8 ) and the Th/Ghrh/Gal cluster (q = 4.79 × 10 -5 ) the latter of which were referred to as 'neuroendocrine' cells by Hook, McClymont [37], and the former a mixture of arcuate nucleus populations with Prlr was one of the marker genes, suggesting this includes the TIDA neurons. Additionally, P7 midbrain neurons were the other group with significant over-representation (specifically from the PAG and VTA) as well as the neuroblasts at this time point.
Although no specific adult mouse midbrain datasets exist, ventral midbrain sequencing at E11.5-E18.5 by La Manno, Gyllborg [38] allowed us to identify imprinted enrichment within the midbrain at a timepoint when the major neuronal populations are differentiating but still identifiable (Supplemental Table S15). As anticipated, we found significant over-representation in both mature (DA1; high Th and Slc6a3, q = 0.0103), and developing (DA0, q = 0.0129) dopaminergic neurons, as well as the serotonergic neurons (q = 3.09 × 10 -7 ), likely from the midbrain raphe nuclei.
Raphe nuclei from the midbrain/hindbrain are key serotonergic regions of the brain. Analysis of all cell types in the Dorsal Raphe Nucleus (DRN) sequenced by Huang, Ochandarena [39] revealed a clear enrichment of imprinted genes in the neuronal populations of the DRN as compared to the non-neuronal cell populations of the DRN (Supplemental Table S16A). When compared to all other cell populations, significant ORA was seen for Dopaminergic (q = 0.009), Serotonergic (q = 0.012) and Peptidergic neurons (q = 0.0008), however, a significant GSEA was found for all five neuronal populations (Supplemental Fig. S6). When compared against each other (i.e., serotonergic upregulation vs. the other neurons), only the serotonergic neurons of the DRN (q = 0.0019) were found to have a significant over-representation of imprinted genes (Supplemental Table S16B). GSEA's were non-significant but the mean fold change for imprinted genes was markedly higher in both serotoninergic (52% higher) and dopaminergic neurons (68% higher). When contrasting neuronal subpopulations of the DRN, two of the five serotonin subpopulations had significant over-representation of imprinted genes: Hcrtr1/Asb4 (q = 0.0014) and Prkcq/Trh (q = 0.007) (Supplemental Table S16C). These clusters were identified by Huang, Ochandarena [39] as the only clusters localised in the dorsal/lateral DRN and the serotonin clusters enriched in Trh. Huang, Ochandarena [39] hypothesised that these were the serotonin neurons that project to hypothalamic nuclei, and motor nuclei in the brainstem (as opposed to cortical/striatal projection).

Imprinted gene expression is over-represented in lactotrophs and somatotrophs of the pituitary gland (Level 3D analysis)
Following on from the enrichment seen above for imprinted gene expression in the dopaminergic arcuate nucleus neurons coordinating pituitary gland output, we sought to identify whether any cells in the pituitary would display matching over-representation for imprinted gene expression (Level 3D). The pituitary was not sequenced as part of the multi-organ or whole brain datasets analysed above and so two independent datasets were analysed that specifically sequencing the mouse pituitary at single cell resolution. Ho, Hu [40] recently sequenced the anterior pituitary gland of male and female C57BL/6 mice using two sequencing technologies, both 10X genomic and Drop-Seq. This identified a variety of cell types from the endocrine and non-endocrine pituitary. We analysed data from both technologies and found that imprinted gene expression was convergently overrepresented in the Lactotrophs (prolactin secreting) and Somatotroph (growth hormone secreting) cells (Supplemental Table S17A & 17B). In a second independent (See figure on next page.) Fig. 7 Imprinted genes upregulated in neurons across the whole hypothalamus. A GSEA for imprinted genes upregulated in the 'Neuron' cell type in the whole hypothalamic dataset of Chen, Wu [33]. See legend of Fig. 3A for a description of how to interpret the plot. B Dot plot of imprinted genes upregulated in the 'Neuron' cell type plotted across all identified cell types in the Chen, Wu [33]  dataset sequencing cells from male mouse pituitary glands [41], we found significant over-representation in the Somatotropes and Thyrotrope (secreting thyroid stimulating hormone). Figure 9 demonstrates the overlap in imprinted genes significantly expressed in Somatotropes and Lactotropes across the datasets since these were the only cell-types to be over-represented in more than one dataset (Supplemental Table S18). It is notable that the two cell types represented here directly match the two regulatory neurons found over-represented in the arcuate nucleus of the hypothalamus.

Discussion
Using publicly available single cell transcriptomics data, we apply an unbiased systems biology approach to examine the enrichment of imprinted genes at the level of the brain in comparison to other adult tissues, refining this analysis to specific brain regions and then to specific neuronal populations. We confirm a significant overrepresentation in the brain, specifically in neurons at every level tested, with a marked enrichment in neuroendocrine cells lineages. Within-brain analyses revealed that the hypothalamus and the monoaminergic system of the mid-and hindbrain were foci for imprinted gene enrichment. While not all imprinted genes follow these patterns of expression, these findings highlight collective gene expression which is non-random in nature. As such, these analyses identify 'expression hotspots' , which in turn suggest 'functional hotspots' . Specifically, our results at the systems and cellular level highlight a major role for imprinted genes in the neuronal regulation of pituitary function, feeding and sleep. Some of the earliest studies of genomic imprinting identified the brain as a key area for imprinted gene expression [12,24]. However, it is estimated that ~ 80% of the genome is expressed in the brain and consequently, imprinted gene expression here may not be a purposeful phenomenon. Our current analysis definitively show that imprinted genes were significantly over-represented in the brain as a whole. This over-representation was found again with PEGs alone, but not MEGs. Critically, overrepresentation was also found when limiting our analysis to genes that have been found to be imprinted in nonbrain tissues (Supplemental Table S19). Excluding those genes imprinted in the brain alone avoids unintentionally biasing the analysis and confirms the robustness of the finding that imprinted gene expression is enriched in the brain.
Within specific brain regions, imprinted genes were over-represented in the hypothalamus, ventral midbrain, pons and medulla. This confirms some findings from studies of Pg/Gg and Ag chimera studies [12,24] and summaries of imprinted gene expression [8]. However, unlike these earlier studies, our analyses do not simply ask if imprinted genes are expressed (at any level) or not, but robustly test whether this expression is meaningful, and the expression of these genes are especially enriched in any given brain region. Additionally, in the chimera studies, Pg/Gg cells (two maternal genomes) preferentially allocated to the developing adult cortex and hippocampus, and Ag cells (two paternal genomes) preferentially allocated to the developing hypothalamus and midbrain. Our analysis does not reproduce this distinct pattern of MEG and PEG expression in the brain, and indeed we find no specific enrichment of imprinted genes in cortex or hippocampus. Although the pattern of regional enrichment seen with all imprinted genes is replicated when analysing PEGs alone, separate analysis of MEGs only shows over-representation in the pons and medulla. This difference between our findings and the Pg/Gg and Ag chimeras studies could indicate that the distribution of Pg/Gg and Ag cells in the brain is not driven by adult PEG and MEG expression, but instead is determined by expression of specific imprinted genes during brain development [42].
At the whole brain level, mature neurons and, in particular, neural-lineage neuroendocrine cells, had disproportionately higher numbers of imprinted genes expressed, and high levels of imprinted gene expression. It is likely that this neural-lineage neuroendocrine Ascl2, Slc38a4 Nhlrc1  [41] and Ho, Hu [40]. Imprinted genes listed show significant upregulation (q ≤ 0.05 and Log2FC > 0) in the cell types. Parental bias is indicated by colour (MEG -red, PEG -blue). Genes in common from two analyses are presented in bold and totalled in each section of the Venn Diagram, while genes found upregulated in one analysis but not available in the others are included in small font and the number indicated in brackets population comprises members of the key hypothalamic populations in which the expression of imprinted genes are enriched and, when treated as their own cluster, demonstrate strong imprinted gene enrichment compared to other cell lineages of the brain, even other mature neurons.
Within the hypothalamus, a selection of informative neuronal subpopulations were over-represented. Strikingly, and suggestive of meaningful enrichment, we saw convergence across our different levels of analysis with several key neuronal types identified in the whole hypothalamus and/or hypothalamic-region-level, already having been identified against the background of general imprinted gene expression in the whole-brain-level analysis. These subpopulations are collectively associated with a few fundamental motivated behaviours. We consistently saw enriched imprinted gene expression in Agrp expressing neurons when contrasting neurons across the whole brain, whole hypothalamus and within the arcuate nucleus. Agrp neurons from the arcuate nucleus are well known feeding promotors and a few imprinted genes have previously been associated with their function (Asb4, Magel2, Snord116) [43,44] but never as an enriched population. Feeding was further linked with imprinting through enrichment seen in Pomc + neurons [45] as well as Hcrt + and Gal + neurons. Circadian processes are controlled principally by the Suprachiasmatic Nucleus and here we find strong imprinted gene enrichment in Avp/Nms expressing neurons, an active circadian population. Again, these were found to be enriched when contrasting neurons across the whole brain, whole hypothalamus and within the SCN. This population is of interest given the growing appreciation of the role imprinted genes play in circadian processes and the SCN suggested by studies of individual imprinted genes [46]. Pituitary endocrine regulation also emerged as a key function, considering the over-representation in the dopaminergic: Th/ Slc6a3/Prlr neuron type (top hit in the arcuate nucleus and across dopaminergic neurons of the brain) and the Th/Ghrh subpopulation. These neuron populations can regulate prolactin (regulating lactation, stress, weight gain, parenting and more [47,48]) and growth hormone (promoting growth and lipid/carbohydrate metabolism) release, respectively. Remarkably, we also found a matching enrichment in the lactotroph and somatotroph cells in the pituitary. A role for imprinted genes in pituitary function is well known [49,50], with pituitary abnormalities associated with imprinted disorders such as PWS [51] and recent sequencing work showing imprinted genes are amongst the highest expressed transcripts in the mature and developing pituitary [52]. Specific genes highly expressed here, such as Dlk1 and Nnat, have been shown to alter somatotroph phenotypes [53,54]. Finally, we saw enrichment in galanin expressing neuronal populations (found enriched when contrasting neurons across the whole brain, whole hypothalamus). Galanin neurons in the hypothalamus have a diverse set of functions including subpopulations for thermoregulation, feeding, reproduction, sleep and parenting behaviour [55,56], contributing to this consistent picture of IGs associating with neurons key for motivated behaviour.
In this analysis the hypothalamus was a clear hot spot for imprinted gene expression, in line with the prevailing view of imprinted gene and hypothalamic function [50,57]. However, outside of the hypothalamus other distinct hotspot emerged from our whole brain analysis including the monoaminergic system of the midbrain/hindbrain. Analysing data from the dorsal raphe nucleus and ventral midbrain revealed the dopaminergic and serotonergic neurons to be a foci of imprinted gene expression within this region. These midbrain dopamine neurons were enriched when contrasted to other dopamine neurons from the brain and the enriched serotonergic neurons were those that project to the subcortical regions of the brain known to be associated with feeding and other motivated behaviours [58], providing convergence with the functional hotspots seen in the hypothalamus.
Analyses of these kind are always bound by the available data and therefore there are notable limitations and caveats to this study. The aim of this study was to generate information about 'hotspots' of imprinted gene expression. This approach, and the use of over-representation analysis and GSEA, therefore do not provide an exhaustive list of sites of expression, and non-differentially expressed genes could still be highly expressed genes despite not contributing to this analysis. An example of a known site of expression for imprinted genes not found to be enriched in our analysis was the oxytocin neurons of the hypothalamus, since a clear oxytocin neuron phenotype has been reported in a handful of imprinted gene models [16,59]. This may be an example of a functional effect occurring below the level of over-representation, or that imprinted genes act during development and are not functionally enriched in adult oxytocin neurons, or simply that compared to other hypothalamic neuronal populations, oxytocin neurons are not a 'hotspot' of imprinted expression. Specific sequencing of oxytocinergic brain regions will be required to distinguish between these possibilities. A second caveat is that, due to the nature of the datasets used, not all imprinted genes were included, and our analysis was missing a significant subset of imprinted genes encoding small RNAs or isoforms from the same transcription unit. A third caveat is that we did not assess parent-of-origin expression for the 119 imprinted genes we included in the analysis. Previous expression profiling of imprinted genes have also not measured the POEs [8,60] but have restricted their gene selection to genes with reliable imprinting status. Consequently, we only included the canonical imprinted genes and genes with more than one demonstration of a POE when looking for enrichment. Furthermore, for the vast majority of these genes, a brain-based POE effect has also already been reported (Supplemental Table S1). Although this does not replace validating the imprinting status of all 119 in the tissues and subregions examined, it does provide justification for looking at imprinted gene over-representation. To resolve this issue, scRNA-seq using tissues derived from reciprocal F1 crosses between distinct mouse lines will be key; for example, the recent work of [61] with cortical cell types provides an example of the allelic specific single-cell expression measurements necessary to confirm the enrichments found in this study.
By exploiting scRNA-seq data we have asked whether imprinted genes as a group are disproportionately represented in the brain, in specific brain regions, and in certain neuronal cell-types. In the adult brain imprinted genes were over-represented in neurons, and particularly the hypothalamic neuroendocrine populations and the monoaminergic hindbrain neurons, with the serotonergic neurons demonstrating the clearest signal. Interestingly, PEGs, but not MEGs, recreate this signal at Levels 1 and 2-most notably only PEGs display the hypothalamic neuronal enrichment. By extension, these data also identify behaviours that are foci for the action of imprinted genes. Although there are high profile examples of individual imprinted genes expressed in the key brain regions we highlight and that have roles in feeding (Magel2) [62] and sleep (Snord116) [63], our analyses indicate that imprinted genes as a group are strongly linked to these behaviours and also identify other individual genes that should be explored in these domains. Conversely, there are high-profile examples of imprinted genes involved in hippocampus related learning and memory (Ube3a) [20], but we did not find enrichment for cell types related to this brain function. The idea that imprinted genes converge on specific physiological or behavioural processes is not unprecedented. Specialisation of function is predicted when considering why genomic imprinting evolved at all [5,[64][65][66]. Moreover, there is increasing evidence that the imprinted genes themselves appear to be co-expressed in an imprinted gene network (IGN) and have confirmed regulatory links between each other [67][68][69]. The idea of an IGN or, at the very least, heavily correlated and coordinated expression between imprinted genes, adds further support to the idea that imprinted genes work in concert rather than in isolation to influence processes, and that perturbating one may influence many others [70]. Our findings add substance to these general ideas and highlight the neuronal regulation of pituitary function, feeding and sleep as being key functional hotspots on which imprinted genes converge which probably provides the best current basis for discerning evolutionary drivers of genomic imprinting in the brain.

Data processing
Thirteen unique datasets were analysed across the three levels of analysis (see Fig. 1) and analyses were conducted on each dataset independently. At each level of analysis, we aimed to be unbiased by using all the datasets that fitted the scope of that level, but the availability of public scRNA-seq datasets was limited, which prevented us from exploring all avenues (for example, a direct comparison of enrichment between hypothalamic nuclei). All sequencing data were acquired through publicly available resources and each dataset was filtered and normalised according to the original published procedure. Supplemental Table S20 details the basic parameters of each dataset. Once processed, each dataset was run through the same basic workflow (see below and Fig. 10), with minor adjustments laid out for each dataset detailed in the Supplemental Methods.
Due to the high variability in sequencing technology, mouse strain, sex and age, and processing pipeline, we have avoided doing analysis on combined datasets. Rather we chose to perform our analyses independently for each dataset and look for convergent patterns of imprinted gene enrichment between datasets on similar tissues/brain regions. As with any single-cell experiment, the identification of upregulation or over-representation of genes in a cell-type depends heavily on which other cells are included in the analysis to make up the 'background' . Analysing separate datasets (with overlapping cell-types alongside distinct ones) and looking for convergent patterns of enrichment is one way of counteracting this limitation.

Basic workflow
Data were downloaded in the available form provided by the original authors (either raw or processed) and, where necessary, were processed (filtered, batch-corrected and normalized) to match the author's original procedure. Cell quality filters were specific to each dataset and summarised in Supplemental Table S20. A consistent filter, to remove all genes expressed in fewer than 20 cells, was applied to remove genes unlikely to play a functional role due to being sparsely expressed. Datasets of the whole brain/hypothalamus were analysed both at the global cell level (neuronal and non-neuronal cells) and neuron specific level (only neurons) with genes filtered for the ≥ 20 cell expression at each level before subsequent analysis. Cell identities were supplied using the outcome of cell clustering carried out by the original authors, so that each cell included in the analysis had a cell-type or tissue-type identity. This was acquired as metadata supplied with the dataset or as a separate file primarily from the same depository as the data but occasionally acquired from personal correspondence with the authors. Cells were used from mice of both sexes when provided and all mice were aged 15 weeks or younger across all datasets. Fig. 10 Basic workflow schematic. Single Cell Expression Matrices were acquired through publicly available depositories. Data were processed according to the author's original specifications and all genes were required to be expressed in 20 or more cells. Cell population identities were acquired from the author's original clustering. Positive differential gene expression was calculated via Wilcoxon Rank-Sum Test. Upregulated genes were considered as those with q ≤ 0.05 and a Log2FC ≥ 1 for analysis levels 1 and 2, while this criterion was relaxed to Log2FC > 0 for level 3. Our imprinted gene list was used to filter upregulated genes and two different enrichment analyses were carried out, over-representation analysis via Fisher's Exact Test and Gene Set Enrichment Analysis via Liger algorithm (Subramanian, Tamayo [71], https:// github. com/ JEFwo rks/ liger). Venn diagrams and dot plots were utilised for visualisation Although our focus was the adult mouse brain, embryonic data were included in some comparisons or when no alternatives were present. However, embryonic and postnatal cells were never pooled to contribute to the same cell populations.
Positive differential expression between identity groups were carried out using one-sided Wilcoxon rank-sum tests (assuming the average expression of cells within the current identity group is 'greater' than the average of cells from all other groups). The test was performed independently for each gene and for each identity group vs. all other groups. The large number of p values were corrected for multiple comparisons using a horizontal Benjamini-Hochberg correction, creating q values. Fold-change (FC) values, percentage expression within the identity group and percentage expressed within the rest were also calculated. We considered genes to be significantly positively differentially expressed (significantly upregulated) in a group compared to background expression if it had a q ≤ 0.05. In addition, for Level 1 and Level 2 analyses, the criteria for upregulated genes included demonstrating a Log2FC value of 1 or larger (i.e., twofold-change or larger). The datasets at these levels represented cells from a variety of organs, regions and cell-types, and in line with this cellular diversity, the aim of these analyses was to look for distinctive upregulation, akin to a marker gene. Once the analysis was restricted to cell subpopulations within a specific region of the brain (i.e., Level 3), the additional criteria for upregulation was relaxed to demonstrating just a positive Log2FC (i.e., the gene has a higher expression in this cell type than background). This was mainly because we were not expecting imprinted genes to be 'markers' of individual subpopulations at this level, but our aim was to identify enriched expression profiles for them. This additionally ensures consistent criteria for enrichment within levels, allowing meaningful comparison.
The same custom list of imprinted genes with reliable parent-of-origin effects (see below) was used for all analyses, and all genes were included as long as the gene passed the 20-cell filter. The first statistical analysis for enrichment was an Over-Representation Analysis (ORA) using a one-sided Fisher's Exact Test ('fisher.test' function in R core package 'stats v3.6.2'). The aim was to assess whether the number of imprinted genes considered to be upregulated as a proportion of the total number of imprinted genes in the dataset (passing the 20-cell filter) was statistically higher than would be expected by chance when compared to the total number of upregulated genes as a proportion of the overall number of genes in the dataset (passing the 20-cell filter). To limit finding over-represented identity groups with only a few upregulated imprinted genes, an identity group was required to have ≥ 5% of the total number of imprinted genes upregulated for ORA to be conducted. Subsequent p-values for all eligible identity groups were corrected using a Bonferroni correction. This provided a measure of whether imprinted genes are expressed above expectation (as opposed to the expression pattern of any random gene selection) in particular identity groups.
Venn diagrams of the upregulated imprinted genes making up over-represented identity groups across datasets (within a level) were also reported. Full lists of upregulated imprinted genes can be found in the 'Upreg-ulated_IGs.csv' file for each analysis in the Supplemental Data.
To further examine the presence of imprinted genes within tissues/cell types, and to provide a different perspective to over-representation, we conducted a Gene-Set Enrichment Analysis (GSEA) for imprinted genes amongst the upregulated genes of an identity group using a publicly available, light-weight implementation of the GSEA algorithm [71] in R (https:// github. com/ JEFwo rks/ liger). This was done in a manner similar to Moffitt, Bambah-Mukku [72] since we were similarly using this computational method to identify enrichment of our gene sets within the upregulated genes of the different identity groups. Here, the GSEA was conducted for each individual identity group using Log2FC values to rank the upregulated genes. The GSEA acts as a more conservative measure than the ORA since it tests whether imprinted genes are enriched in the stronger markers of a group (the genes with the highest fold change for a group vs. the rest) and hence whether the imprinted genes are enriched in those genes with a high specificity to that tissue/cell type. To prevent significant results being generated from just 2 or 3 genes, identity group to be analysed were selected as having a minimum of 15 upregulated imprinted genes (i.e. the custom gene set) to measure enrichment for (a value suggested by the GSEA user guide (https:// www. gsea-msigdb. org/ gsea/ doc/ GSEAU serGu ide Frame.html)) and to prevent significant results in which imprinted genes cluster at the tail, identity groups were selected as having an average fold change of the upregulated imprinted genes greater than the average fold change of the rest of the upregulated genes for that group. Again, multiple p values generated from GSEA were corrected using a Bonferroni correction. To further elucidate the genes responsible for significant GSEA's, dot plots of the imprinted genes upregulated in that identity group were plotted across all identity groups with absolute expression and Log2FC mapped to size and colour of the dots, respectively. Graphical representations of significant GSEA's (post-correction) are included in the main text or as Supplemental Figures , all other graphs, including additional dot plots not discussed in this study, can be found in the repository (https:// osf. io/ jx7kr/) and Supplemental Data. If no cell populations met these criteria, GSEA was not run and not included for that analysis.
For Level 1 and Level 2 analyses, we also carried out parent-of-origin specific analyses. The imprinted gene list was divided into MEGs and PEGs and the analyses detailed above were run separately for these two gene groups. For imprinted genes with known parent-of-origin variability based on tissue type (Igf2 and Grb10), the parent-of-origin characterisation of these genes was changed accordingly. The absolute number of imprinted genes top-expressed in a tissue/cell-type were also reported for analyses in Level 1 and Level 2 in the tables, since these analyses included a variety of cell-types and tissues which may demonstrate meaningful clustering of the highest normalised expression values. The mean normalised expression for all imprinted genes across the series of identity groups in the datasets in Level 1 and Level 2 was also calculated alongside the mean normalised expression for the rest of the genes (Supplemental Table S2).

Custom imprinted gene list
The gene list for the analysis was based on the list of murine imprinted genes recently published in Tucci, Isles [2]. Although the original list of imprinted genes was 260 genes long, only 163 genes were identified in the most comprehensive of the datasets. We further refined this list to 119 imprinted genes (Supplemental Table S1a) which excluded the X-linked genes, consisting of mostly the canonical protein-coding and long noncoding RNA imprinted genes, but the criteria for inclusion was those genes with at least two independent demonstrations of their POE status (See Supplemental Table S1b for full list of 260 imprinted genes and reasons for gene exclusion). The only exceptions to multiple independent demonstrations of a POE were four genes (Bmf, B3gnt2, Ptk2, Gm16299) identified by [26] where a POE was assessed across 16 brain regions and 7 adult tissues within one study. For Level 2, the MEG/ PEG status of a gene was primarily based on reported allelic expression within the brain. Small non-coding RNAs such as micro-RNAs (miRs) and small nucleolar RNAs (snoRNAs), which represent ~ 10% of identified imprinted genes, were excluded from the analysis as their sequences were not detected/subsumed by larger transcripts in the majority of the datasets. Another caveat with short-read RNA-seq libraries is that much of the expression data for a given transcription unit cannot discriminate differentially imprinted isoforms nor do some of the technologies (e.g., Smart-Seq2) possess stranded libraries to distinguish antisense transcripts. For complex imprinting loci such as the Gnas locus, most reads as result map to only Gnas and Nespas ignoring several overlapping and antisense genes.