Comparative neurotranscriptomics reveal widespread species differences associated with bonding

Background Pair bonding with a reproductive partner is rare among mammals but is an important feature of human social behavior. Decades of research on monogamous prairie voles (Microtus ochrogaster), along with comparative studies using the related non-bonding meadow vole (M. pennsylvanicus), have revealed many of the neural and molecular mechanisms necessary for pair-bond formation in that species. However, these studies have largely focused on just a few neuromodulatory systems. To test the hypothesis that neural gene expression differences underlie differential capacities to bond, we performed RNA-sequencing on tissue from three brain regions important for bonding and other social behaviors across bond-forming prairie voles and non-bonding meadow voles. We examined gene expression in the amygdala, hypothalamus, and combined ventral pallidum/nucleus accumbens in virgins and at three time points after mating to understand species differences in gene expression at baseline, in response to mating, and during bond formation. Results We first identified species and brain region as the factors most strongly associated with gene expression in our samples. Next, we found gene categories related to cell structure, translation, and metabolism that differed in expression across species in virgins, as well as categories associated with cell structure, synaptic and neuroendocrine signaling, and transcription and translation that varied among the focal regions in our study. Additionally, we identified genes that were differentially expressed across species after mating in each of our regions of interest. These include genes involved in regulating transcription, neuron structure, and synaptic plasticity. Finally, we identified modules of co-regulated genes that were strongly correlated with brain region in both species, and modules that were correlated with post-mating time points in prairie voles but not meadow voles. Conclusions These results reinforce the importance of pre-mating differences that confer the ability to form pair bonds in prairie voles but not promiscuous species such as meadow voles. Gene ontology analysis supports the hypothesis that pair-bond formation involves transcriptional regulation, and changes in neuronal structure. Together, our results expand knowledge of the genes involved in the pair bonding process and open new avenues of research in the molecular mechanisms of bond formation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07720-0.

Conclusions: These results reinforce the importance of pre-mating differences that confer the ability to form pair bonds in prairie voles but not promiscuous species such as meadow voles. Gene ontology analysis supports the hypothesis that pair-bond formation involves transcriptional regulation, and changes in neuronal structure. Together, our results expand knowledge of the genes involved in the pair bonding process and open new avenues of research in the molecular mechanisms of bond formation.

Background
The profound social attachments humans form with one another are a defining feature of our species. While many species develop parent-offspring or mate pair bonds, these attachments are especially strong and longlasting in humans, particularly those between reproductive partners [1]. In fact, the social bonds formed by human partners are quite uncommon among mammals: only 9% of mammalian species display social monogamy, including just 6% of rodents, which are often used as model species to investigate the neural basis of behavior [2]. Importantly, commonly used laboratory species including mice and rats are not monogamous and lack the ability to form lasting pair bonds. A better understanding of the mechanisms underlying the formation of partner bonds requires use of a species that forms such bonds.
The voles of the genus Microtus (Fig. 1a, b) have been developed into powerful models for better understanding the neural mechanisms underlying attachment and pair bonding [5]. Typically, the focal species of these studies is the prairie vole (Microtus ochrogaster, Fig. 1a), which has become famous for the tendency for males and females to form long-lasting, socially monogamous bonds. Field studies of prairie vole space use and behavior demonstrated a socially monogamous mating system [6,7] (though extrapair fertilization does occur and some individuals adopt more promiscuous mating tactics [8,9]). In the lab, receptive virgin prairie voles will mate with novel opposite-sex conspecifics and, over a period of several hours of mating and other affiliative behaviors, will form a pair bond characterized by selective affiliation and increased aggression towards intruders [3,4,6,7].
Prairie voles are often contrasted with closely related species which do not form these bonds [3,[10][11][12][13][14], including the meadow vole (Microtus pennsylvanicus, Fig. 1b). Field studies of meadow voles are consistent with a promiscuous mating system and in the lab, they do not show the selective affiliation with mating partners characteristic of pair bonds [12,15,16]. Comparative studies have been particularly useful in uncovering the features of the prairie vole brain that facilitate bond formation. These include neuroendocrine signaling in the brain (particularly via the nonapeptides oxytocin [OT] and arginine-vasopressin [AVP]) which conveys social salience, mechanisms of social recognition and memory, and dopaminergic reward pathways [17,18]. Current hypotheses propose that pair bonding results from synaptic plasticity that links the neural encoding of partner cues with the reward system, leading to persistent reinforcement of the partner that results in selective affiliation [17].
While significant progress has been made in understanding the neural mechanisms underlying vole pairbond formation, previous studies have mainly focused on a relatively small number of neuromodulatory pathways, including the nonapeptides and dopamine reward systems mentioned above, as well as opioid signaling in the brain [17,18]. In order to identify additional genes that may be playing a significant role in bond formation, it is necessary to take a broader perspective and examine changes to global gene expression in regions with a known role in bonding.
In this study, we used a comparative approach, taking advantage of the close evolutionary relationship [19] and stark contrast in bonding behavior between prairie and meadow voles to better understand the molecular mechanisms that support pair-bond formation and identify novel candidate genes for future study. Our study focused on three regions (Fig. 1c): the amygdala (AMY), hypothalamus (HT), and a region inclusive of the ventral pallidum and nucleus accumbens (VP/NAc). Each of these regions is proposed to have a critical function in the development of a pair bond [17]. The rodent AMY receives olfactory information important for social recognition [20,21], encodes valence [22], and plays an important role in social memory [23,24]. The HT contains several nuclei that are involved in regulating social behavior, including populations that produce the neuropeptides OT and AVP, which signal social salience, influence social memory, and regulate related behaviors [12,[25][26][27][28]. Finally, signals of social salience converge with dopaminergic signaling from the ventral tegmental area in the NAc, resulting in disinhibition of the VP, which influences behavioral responses to rewarding stimuli [29][30][31][32]. While several other regions, such as the ventral tegmental area, hippocampus, and cortical regions also have important functions in the development and maintenance of pair bonds [17,18], our focal regions represent key nodes involved in critical aspects of bonding: social context, social memory, and reward.
We used RNA-sequencing to quantify gene expression before and at three time points (0.5, 2, and 12 h) after mating (Fig. 1d) in each of these regions across both prairie and meadow voles. By observing changes in gene expression in these regions across time relative to mating and across pair-bonding and non-bonding species, we sought to identify genes involved in the process of pair-bond formation, either by conferring the capacity to bond before mating or changing their expression to support bonding after mating. We find gene categories that differ across species at baseline (pre-mating) and across brain regions, as well as genes that differ in expression post-mating between prairie and meadow voles. Further, using gene network analysis, we identify gene modules that have expression patterns strongly related to specific brain regions, as well as modules that change in expression in response to mating. Together, these results provide further support for the current models of mammalian pair bonding and provide additional candidate genes that may underlie the formation of monogamous bonds.

Results
In order to identify the factors (e.g. species, brain region, time relative to mating) most strongly associated with gene expression in our samples, we first performed hierarchical clustering based on Poisson distance [33], followed by principal component analysis (PCA). In both Regions of interest for this study are highlighted, including the amygdala (AMY, blue), hypothalamus (HT, green), and ventral pallidum and nucleus accumbens (VP/NAc, purple). d Above: Experiment Timeline. Virgin animals (time 0) were collected prior to mating. Mated animals were collected at either 0.5, 2, or 12 h after first intromission. Below: Timeline of prairie vole bond formation. Unfamiliar opposite sex conspecifics will mate shortly after introduction. Bonds become detectable at 6 h after onset of mating and strengthen up to 24 h after mating onset [3,4] cases, the primary factor influencing gene expression was species, followed by brain region (Fig. 2). Hierarchical clustering based on Poisson distance resulted in a tree with two major branches representing samples from prairie and meadow voles. These branches were then split corresponding to brain region. In prairie voles, AMY and VP/NAc clustered more closely to each other, while in meadow voles AMY and HT were more closely clustered (Fig. 2a). This result was also supported by PCA, which revealed a first principal component (PC1) that explained 39% of the variance in the data and split samples by species, followed by a second component (PC2) that explained 34% of the variance and primarily separated samples by brain region (Fig. 2b).

Differential gene expression Species comparison of virgins
To identify genes that may confer the capacity to bond in prairie but not meadow voles prior to mating, we used a subset of the data containing only samples collected from virgins (AMY, HT, and VP/NAc samples from n = 4 individuals of each species). We tested for the effect of species using the likelihood ratio test implemented by DESeq2 [34]. This test compares two generalized linear models fit to the data, a full model including all terms and a reduced model with one or more terms omitted. The test determines if the effect of the terms omitted in the reduced model is significantly greater than zero.
To determine the effect of species, we applied the full model~region + species and reduced model~region. 8377 of 12,672 genes were significantly differentially expressed (FDR < 0.1) in this comparison (Fig. 3a, Supplementary Table 1). In order to identify the types of genes that differed in expression across species, we used the Mann-Whitney U (MWU) test (see Methods for details) to identify enriched gene ontology (GO) categories [36,37]. The MWU test of GO term enrichment revealed 48 significantly enriched (FDR < 0.1) terms in the contrast between prairie and meadow vole virgins (Fig.  3b). This included 10 terms in the category Molecular Function, 11 in Biological Process, and 27 in the category Cellular Component. Enriched terms in prairie voles were largely associated with cell structure and extracellular space, while those enriched in meadow voles were associated with the ribosome and translation, mitochondrial function, and metabolic and biosynthetic processes.

Comparison of regions
As sample region was the second largest factor explaining gene expression data, we next compared expression across brain regions. To understand how gene expression differed across our focal brain regions, we used the entire dataset and applied the likelihood ratio test with the full model~species + region + time and the reduced model species + time. Overall, 10,891 of 12,687 total genes significantly differed (FDR < 0.1) in expression on the basis of sample region (Fig. 4a-c, Supplementary Table 2).
Thirty-two GO terms were significantly enriched (MWU FDR < 0.1) in the contrast between AMY and HT, including two in the category Biological Process and 30 under Cellular Component (Fig. 4d). Terms enriched in AMY were largely associated with neuronal structure and the synapse, while those enriched in HT were associated with membranous structures including endoplasmic reticulum, Golgi apparatus, intracellular vesicle, and Color key to the left shows sample type for each row. Hierarchical clustering trees show that samples split first by species, then by brain region. b Principal component analysis of sample gene count data shows first principal component (PC1) that separates samples by species and accounts for 39% of sample variance. Second principal component (PC2) separates samples primarily by region and accounts for 34% of variance

Species comparison across time
We next sought to determine the influence of mating and, in prairie voles, pair-bond formation on gene expression in our focal regions. In order to identify how gene expression changes differently across species following mating, we divided the full dataset into subsets containing all samples from a single region. We then tested these regional datasets with the likelihood ratio test, applying the full model~species + time + species: time and the reduced model~species + time. This comparison allowed us to identify those genes that exhibit species-specific patterns of expression following mating.
In the AMY, we found four genes that were significantly differentially expressed (FDR < 0.1) across species over mating time points (Fig. 5a, Table 1, Supplementary  Table 3). The gene with the smallest adjusted p-value (FDR = 6.17*10 − 5 ) in this model was Npas4, an activitydependent transcription factor involved in synapse formation [38] (Fig. 5d). We found 33 significantly enriched GO terms between prairie and meadow voles (MWU FDR < 0.1), including two under Molecular Function and 31 in the category Cellular Component (Fig. 5g). Enriched terms in prairie voles were related to peptide receptors and the plasma membrane. Enriched terms in meadow voles were associated with ribosomal and mitochondrial function.
In the HT, 31 genes were significantly differentially expressed (FDR < 0.1) across species over mating time points (Fig. 5b, Supplementary Table 4), of which 25 were annotated (Table 2). These included water channel-encoding gene Aqp3 [39] (Fig. 5e), which had the lowest adjusted p-value (FDR = 0.0089713). Sixty-five GO terms were significantly enriched (MWU FDR < 0.1) in the contrast between prairie and meadow voles. Among these were two in the category Molecular Function, 15 in Biological Process, and 48 in Cellular Component (Fig. 5h). Enriched terms in prairie voles were typically associated with neuronal development and cell structure. Terms enriched in meadow voles included many related to ribosomal and mitochondrial function, metabolic and biosynthetic processes, and vesicular transport.
In the VP/NAc we found 21 genes that were significantly differentially expressed over mating time points (Fig. 5c, Supplementary Table 5), 19 of which were annotated (Table 3). Among these, Spata16, a gene with known function in spermatogenesis [40] (Fig. 5f) had the lowest adjusted p-value (FDR = 0.00318981). In addition, we found 30 significantly enriched (MWU FDR < 0.1) GO terms between prairie and meadow voles, which included five under Molecular Function, three under Biological Process, and 22 in Cellular Component (Fig.  5i). Only two of these terms were enriched in prairie voles: cluster of actin-based cell projections and extracellular region. Many of the terms enriched in meadow voles were associated with translation, protein localization, mitochondria, and the endosome.

Genes shared across comparisons
Several genes were differentially expressed in multiple comparisons. In region-by-region comparisons across species in virgin animals, the largest proportion of differentially expressed genes were significantly different in all three regions (Fig. 6a). Additionally, 1206 genes were differentially expressed in both HT and VP/NAc, 395 in AMY and VP/NAc, and 409 in AMY and HT. Relatively few genes were uniquely differentially expressed in a single region in this comparison, just 2499 compared to a total of 5031 genes that were differentially expressed in two or all three regions.
While not as large a proportion as in virgins, several genes were also differentially expressed in multiple regions in our species:time comparisons ( Fig. 6b-h). The gene Nfkbia, which encodes NF-Kappa-B Inhibitor Alpha, was significant in our models of both AMY and VP/NAc (Fig.  6e). Five genes were significant in the species:time comparison for both HT and VP/NAc. These included Pxn, which encodes paxillin, an adaptor protein that interacts with the cytoskeleton and signaling molecules to facilitate neurite outgrowth and long-term potentiation [41,42], the circadian clock gene Per1 [43], Cables1 which encodes a kinase-binding protein that promotes neurite outgrowth [44], as well as two unannotated genes ( Fig. 6f-h). Each of these genes was consistently upregulated in one species over the other across tissues, with the exception of Per1, which nevertheless showed consistent patterns of expression over time by tissue and species (Fig. 6g).
Finally, many of the genes that significantly differed across species over post-mating time points were also differentially expressed across species in virgins (Fig. 6c, d). Three of the four genes differentially expressed in AMY at post-mating time points (all upregulated in prairie voles) were also upregulated in prairie vole virgins. For the HT, of 31 genes significant for the species: time interaction, 15 were upregulated in prairie voles in virgins and after mating and eight were upregulated in meadow voles as virgins and after mating. In the VP/ NAc, 15 genes were upregulated in prairie voles as virgins and after mating, while two genes were upregulated in meadow voles in both comparisons.

Correlated gene networks
We next sought to determine if co-regulated groups of genes varied in expression related to our variables of Branches with length of zero are subsets of one another. Fractions preceding GO terms indicate proportion of "good" genes that have raw p-value< 0.05 compared to total number of genes in the category. P-values are corrected using Benjamini-Hochberg false discovery rate procedure [35]. Red terms enriched in prairie voles, blue terms enriched in meadow voles. Bold text indicates adjusted p < 0.01, plain text indicates adjusted p < 0.05, and italicized text indicates adjusted p < 0.1 for term. n.s. indicates no significant terms in the category  Tables 6 and 7 for module assignments and Supplementary Tables 8 and 9 for Pearson r and p-values for significant correlations). For prairie voles, we detected 16 modules ranging in size from 72 to 1707 genes as well as 392 that were unassigned to any module. In meadow voles we detected 12 modules ranging in size from 167 to 1577 genes and 517 genes that were unassigned. We then correlated these modules to sample traits including brain region (AMY, HYP, or VP/NAc), mating status (mated vs. virgin), and collection time points, including time as a continuous variable and each post-mating time point individually.

Module correlation with sample traits
For both species, the traits most strongly correlated with modules were sample brain region (Fig. 7a, b). In prairie voles (Supplementary Table 8), the green module was most strongly positively correlated with AMY, while the greenyellow module was strongly negatively correlated with the region. The brown, magenta, tan, cyan, pink, black, red, salmon, and turquoise modules were also significantly correlated with AMY. For HT, the turquoise module showed the strongest positive correlation, and yellow had the strongest negative correlation. Modules blue, purple, greenyellow, magenta, pink, green, red, and salmon were also significantly correlated with HT. Finally, the magenta module was most strongly positively, and red module most strongly negatively correlated with VP/ NAc. The purple, yellow, greenyellow, tan, pink, green, black, and turquoise modules were also significantly correlated with this region. In meadow voles (Supplementary Table 9), the darkorchid module was most strongly positively, and tomato most strongly negatively correlated with AMY. The ghostwhite, orange, seagreen, darkred, plum, gold, and navyblue modules were also significantly correlated with AMY. Modules seagreen and darkred were most positively and negatively correlated with HT, respectively. The orange, tomato, darkorchid, plum, deeppink, and gold modules also had significant correlations with HT. Finally, the gold module had the strongest positive correlation with VP/NAc and orange module had the strongest negative correlation. Modules ghostwhite, seagreen, darkorchid, darkred, and navyblue were each also significantly correlated with VP/NAc.
In general, module correlations with mating status and collection time points were weaker than those with brain regions (Fig. 7a, b). For prairie voles (Fig. 7a, Supplementary Table 8) three modules, cyan, pink, and brown were significantly positively correlated with having mated. The blue, salmon, and midnightblue modules were negatively correlated with having mated, indicating a positive association with virgin animals. Only the grey module containing unassigned genes was significantly correlated with overall time of collection in the experiment. This relationship appears to be driven by a significant negative correlation with the 12 h collection time point. However, two modules were significantly correlated with specific collection time points. The cyan module was positively associated with the 0.5 h time point and the blue module was significantly negatively correlated with this time.
Module correlations with mating status and postmating time points were weaker in meadow voles than in prairie voles (Fig. 7b, Supplementary Table 9). No modules were significantly correlated with mating status, and only the grey module containing unassigned genes was correlated with collection time or individual time points. It was negatively correlated with time overall and the 12 h collection time point, and positively correlated with the 0.5 h collection time point.

Module associations across species
To identify relationships between modules across species, we clustered prairie and meadow vole modules on the basis of Euclidean distance (Fig. 7c) and Pearson correlation (Fig. 7d). Clustering revealed eight pairs of modules that were closely related across species. These included the brown prairie vole and navyblue meadow vole modules, prairie magenta and meadow gold, prairie green and meadow darkorchid, prairie yellow and meadow darkred, prairie black and meadow ghostwhite, prairie red and meadow orange, prairie turquoise and meadow seagreen, and prairie greenyellow and meadow tomato. Clustering by correlation also found one pair of prairie vole modules, midnightblue and salmon, that were most closely related. Of these closely related modules, the majority were strongly associated with individual brain regions. The green/darkorchid modules were the most strongly positively correlated with AMY in both species, while the greenyellow/tomato modules were the most negatively correlated with AMY. Modules turquoise/seagreen were most highly positively correlated with HT and yellow/ darkred were most strongly negatively correlated with it. Finally, the gold/magenta modules were most strongly associated and red/orange modules most strongly negatively associated with VP/NAc.
We used GO term analysis on pairs of modules that were strongly positively correlated with each region to identify enriched gene categories. The prairie vole green module, which was strongly correlated with AMY had 85 significantly enriched GO terms (MWU FDR < 0.1), including four in the category Molecular Function, 42 under Biological Process, and 39 in the category Cellular Component (Supplementary Fig. 1a). In meadow voles, the AMY-associated darkorchid module had 53 significantly enriched (MWU FDR < 0.1) GO terms (Supplementary Fig. 1b). These included one term in Molecular Function, cation channel activity, 27 terms related to Biological Process, and 25 terms in the category Cellular Component. In both species, terms related to neuronal projections, ion channels, plasma membrane and extracellular space, receptors, and synaptic transmission were enriched. The terms behavior and cognition were also enriched in both the prairie vole green module and meadow vole darkorchid module. However, the terms associative learning and learning were uniquely enriched in the prairie vole green module.
The prairie vole module turquoise and meadow vole module seagreen were each strongly associated with HT. 33 GO terms were significantly enriched in the turquoise module (MWU FDR < 0.1), including 12 in the category Molecular Function, nine in Biological Process, and 12 in Cellular Component (Supplementary Fig. 2a). In the  Fig. 2b). In both species, several terms related to cilia were enriched. Additionally, in prairie voles, terms relating to neuropeptide signaling, transcription factors, extracellular space were enriched along with the term feeding behavior.
No GO terms were significantly enriched in the prairie vole magenta module, which was strongly correlated with VP/NAc. However, 12 terms were significantly enriched (MWU FDR < 0.1) in the related meadow vole gold module ( Supplementary Fig. 3). These included five in the category Molecular Function and six in the category Cellular Component. In addition, one term, regulation of postsynaptic neurotransmitter receptor internalization was enriched under the category Biological Process. Overall, most enriched terms related to the cell membrane, synaptic transmission, or enzymatic function.

Modules associated with mating
Finally, to identify gene categories associated with mating, we looked for GO term enrichment among the modules most strongly correlated with mating status in prairie voles ( Supplementary Fig. 4). We used the MWU test to identify significantly enriched terms in the cyan and pink modules, which were the most significantly positively correlated with having mated, and the blue and salmon modules, which were the two most significantly negatively correlated with having mated (thus positively associated with virgin animals).
In addition to mating status, these modules also had significant correlations with brain regions (Fig. 7a, Supplementary Table 8). The cyan module was positively correlated with AMY, pink module was positively correlated with AMY and HT, and negatively correlated with VP/NAc. Module blue was negatively correlated with HT and the salmon module was positively correlated with HT and negatively correlated with AMY.
Both the cyan and pink modules, each had significantly enriched terms (MWU FDR < 0.5) related to translation, ribosomes, and mitochondria ( Supplementary Fig. 4a, b). In addition, terms associated with phosphatase binding were enriched in the cyan module and several terms related to ubiquitination, catabolic and metabolic processes, protein localization, and the ESCRT complex were enriched in the pink module. Terms significantly enriched (MWU FDR < 0.5) in the blue included several related to peptidyl-amino acid modification and chromatin modification ( Supplementary Fig. 4c). Those enriched in the salmon module (MWU FDR < 0.5) were largely associated with neuronal development and axon guidance, as well as extracellular space (Supplementary Fig. 4d).

Discussion
Pair bonding is an important social behavior of many animal species, including humans [1,6,46,47]. Significant efforts have been made to understand the neural and hormonal mechanisms underlying the formation of pair bonds; however, these studies have focused on a relatively small number of genes and their products [17]. A critical next step in understanding the mechanisms of bond formation is more fully accounting for the genes underlying pair-bond formation. While other studies have made comparisons of neural gene expression in closely-related monogamous and non-monogamous species [48], to our knowledge, ours is the first to make comparisons across species during an activelydeveloping bond.
In this study, we used RNA-sequencing to observe changes in gene expression in three regions that have critical roles in pair-bond formation including social memory, social context, and reward. We compared two species, the bond-forming prairie vole and the nonbonding meadow vole, before and at three time points after mating. By using a comparative approach, we sought to differentiate those genes which change in expression in response to mating, but do not play a role in bond formation, from the genes that may be critical to pair-bond development. We first identified differences in the gene categories expressed across species in virgins, as well as differences in the categories of genes expressed across our focal regions. Next, we identified those genes that differed in expression post-mating across species in each focal region. Finally, we identified gene expression networks in each species and related modules of genes with correlated expression to sample features. This allowed us to identify modules strongly associated with each brain region, as well as modules which changed in expression in response to mating in prairie voles.

Pre-mating differences and the capacity to bond
Our results emphasize the importance of pre-mating differences in gene expression that confer the capacity to bond in prairie voles but not meadow voles. We found that gene expression count data were most strongly associated with sample species (Fig. 2), and the vast majority of genes included in this study were differentially expressed across prairie and meadow vole virgins (Fig. 3). This is consistent with prior studies that have investigated the neural mechanisms of pair bonding through comparisons between monogamous and promiscuous vole species. For example, the neuropeptide AVP plays an important role in the development of partner preference and aggression toward unfamiliar conspecifics that characterize pair-bond formation in prairie voles [26]. Expression of the V1a vasopressin receptor in the VP is necessary for bond formation in male voles [49]. Meadow voles naturally have very low expression of this receptor in the VP, however, driving its expression in this brain region results in increased partner preference, a hallmark of bond formation, in meadow voles [12].
Similarly, several other studies focused on individual candidate genes or proteins found species differences in expression of OT receptor, mu-opioid receptor, corticotropin-releasing factor receptors, estrogen receptor alpha, and neuropeptide Y [50][51][52][53][54]. Gene expression differences important for behavior have also been found across bonding and non-bonding species in other taxa, including variation in Avp (gene encoding argininevasopressin) expression related to parental care in Peromyscus mice [55] and variation in OT receptor expression across pair-bonding versus solitary species of butterflyfish [56]. Together, these and the present study show that species-level differences in brain gene expression play a large role in determining the capacity to form pair bonds.
While the effects of vasopressin and its receptor are dramatic, it remains likely that many genes underlie the species differences in bonding capacity. Our current study has identified several gene categories that differed in expression across species in virgins. GO terms enriched in prairie voles were largely related to cell structure, the nucleus and extracellular space. These include microtubule organizing center part, centriole, chromatin, and cell-cell junction. Enriched in meadow voles were terms associated with the ribosome and translation, mitochondrial function, and metabolic and biosynthetic processes including translation initiation factor, translation regulator, peptide metabolic process, macromolecule biosynthetic process, ribosomal subunit, and mitochondrial part. These results indicate that gene expression differences across virgin prairie and meadow voles are largely associated with genes involved in cell structure (enriched in prairie voles) and protein synthesis (enriched in meadow voles).

Function of critical bonding-related regions
While it was not the central focus of this study, given that we found sample brain region to be the second strongest factor associated with gene expression, we explored the types of genes that differed between our focal brain regions. Unsurprisingly, given its role in social memory [23], when comparing the AMY to HT, we found enrichment of terms related to synaptic signaling including regulation of postsynaptic neurotransmitter receptor internalization, postsynapse, and glutamatergic synapse (Fig. 4d). In comparison to the VP/NAc, many enriched GO terms in AMY were related to neuropeptide signaling (Fig. 4e). Neuropeptide signaling, particularly by OT, is necessary for the AMY's function in social recognition and memory. The medial AMY expresses OT receptor and receives OT innervation from the paraventricular nucleus of the HT in both mice and prairie voles [23,29]. OT knockout mice are unable to recognize familiar conspecifics, but this deficit is rescued by site-specific injection of OT into the medial AMY [23]. Further, site-specific injection of an OT antagonist into the medial AMY also disrupted social recognition in wild type mice. Beyond OT, the AMY expresses many other neuropeptides and their receptors [57]. Genes in the neuropeptide-related categories enriched in AMY included those encoding galanin, neuropeptide FF, urocortin-3 along with receptors for neuropeptide Y, galanin, somatostatin, and neuropeptide FF along with several others. Our results suggest that there are a variety of neuropeptide systems that are likely at play in AMY function.
Several synapse-related terms were also enriched in the prairie vole green and meadow vole darkorchid WGCNA modules which were strongly correlated with the AMY (Supplementary Fig. 1). Additionally, the terms cognition and behavior were enriched in both modules, while learning and associative learning were enriched only in the prairie vole green module. Individual recognition is an essential aspect of bonding behaviors [17], one presumably mediated by these suites of genes. Taken together, these results point to the active role of the AMY in navigating social interactions, with a particular importance in prairie voles for developing social memory for a new partner during bond formation.
The gene categories we found to be enriched in the HT are also consistent with known functions of this region, a diverse structure important for sensing and producing hormonal signals [58]. In the contrast between HT and VP/NAc, we found enrichment of the terms peptide receptor and neuropeptide receptor (Fig. 4f). Similarly, these terms along with several others related to hormone signaling were enriched in the prairie vole turquoise module which was strongly correlated to HT (Supplementary Fig. 2a). Somewhat surprisingly, we did not find these terms to be enriched in the meadow vole seagreen module which was also strongly correlated with HT ( Supplementary Fig. 2b), though we would expect genes related to hormone and peptide signaling to be just as important in that species as they are in prairie voles. We interpret this as reflecting the well-known role of the hypothalamus in neuroendocrine coordination [58]. Another set of GO terms that were enriched in HT compared to both AMY and VP/NAc as well as in both the prairie vole turquoise and meadow vole seagreen modules were several terms related to cilia (Fig. 4d, f;  Supplementary Fig. 2). These genes are likely related to the ependymal cells that lie along the surface of the ventral part of the third ventricle, which is flanked on either side by the HT [59]. These cilia play an important role in regulating the flow of cerebrospinal fluid through this ventricle. Other groups of terms that were enriched in the HT and related modules included several related to the endoplasmic reticulum and the extracellular region. The endoplasmic reticulum is involved in protein synthesis, including of neuropeptides [60], so enrichment of this category may be related to neuropeptide synthesis and release.
Many of the terms enriched in the VP/NAc compared to either the AMY or HT (Fig. 4e, f)-or in the meadow vole gold module ( Supplementary Fig. 3), which was strongly correlated to VP/NAc-were related to the nucleus, chromosomes, and transcription/translation. In addition, there were several enriched terms related to synaptic function including postsynapse and terms related to dendritic spines in both the comparison with the HT and in the meadow vole gold module and the term regulation of postsynaptic neurotransmitter receptor internalization in the gold module. Together, these enriched terms paint a picture of a transcriptionally actively region with a potentially important role for the formation new synapses or modulation of existing ones.

Post-mating gene expression in support of bond formation
Pair-bond formation is a major life history transition that is dependent on mechanisms of social reward and memory [17]. The necessity of these systems in bonding is the reason we chose to focus on the AMY, HT, and VP/NAc as regions of interest. Prior studies have shown that the AMY is necessary for social recognition and encodes valence [22]. The HT is a diverse structure that has several functions related to social behavior [61]. Finally, the VP/NAc form a circuit important for action selection. Both dopaminergic and OT inputs to NAc are necessary for pair-bond formation in prairie voles [31,32]. Activation of the D2 dopamine receptor in NAc reduces activity of inhibitory neurons in that region, which is proposed to then disinhibit VP neurons which guide behavioral responses to rewarding stimuli, such as mating [17,62].
To better understand the molecular mechanisms at work in these regions in response to mating, and during pair-bond formation in prairie voles, we first focused on the genes in each region that were most significantly differentially expressed (lowest FDR) in our model testing for the effect of species:time interaction. In the AMY, this was Npas4, which peaked in expression for both species at the 0.5 h post-mating time point but was expressed at higher levels in prairie voles pre-mating and at later post-mating time points (Fig. 5d). Npas4 is a transcription factor that plays an important role in the development of inhibitory synapses in response to excitatory inputs by regulating expression of activitydependent genes [38]. In mice, social encounters drive increased Npas4 expression in the hippocampus and knock out of Npas4 results in alterations of social behavior, including reduced time investigating novel conspecifics [63]. Knockout animals in that study also showed deficits in learning and memory tasks.
Mating is a highly-socially salient behavior that increases neural activity in the AMY [64] so it is unsurprising that Npas4 peaks in expression shortly after mating onset. However, the fact that it differs across species both before mating, and at later time points following mating onset suggest that it may play a particularly important role for AMY function in prairie voles compared to meadow voles. It may be that AMY activity in prairie voles is simply higher in the absence of mating or other social cues, driving Npas4 expression more strongly. Alternatively, it may be expressed at constitutively higher levels in prairie voles in the absence of strong activation so that neurons in the region are prepared to develop inhibitory synapses in response to strong excitatory inputs. Either way, this result points to the importance of maintaining excitation/inhibition balance in the AMY for prairie voles.
In the HT, the most strongly differentially expressed gene across species over post-mating time points was Aqp3 (Fig. 5e), a gene encoding a water/glycerol transporting channel [39] that was lowly expressed in both species until the 12 h time point, where it increased in expression in meadow voles but not prairie voles. In rats, Aqp3 is expressed in astrocytes and neurons of several regions, and weakly expressed in ependymal cells, but no expression was found in either the supraoptic or suprachiasmatic nuclei of the hypothalamus [65]. Expression of Aqp3 increased in response to stroke in rats [66] and is elevated in pyramidal cells of cortical tissue from human patients with edema [67]; however, the reason why this gene increases in expression in the HT of male meadow voles 12 h after beginning to mate with a novel female is not immediately clear.
The gene that was most strongly differentially expressed in the VP/NAc between species across post-mating time points was Spata16 (Fig. 5f). This gene showed an interesting opposing pattern of expression across species, starting at high levels and reducing quickly following mating in prairie voles and beginning at low levels of expression and peaking shortly after mating in meadow voles. Spata16 is a gene important for spermatogenesis [40], and to our knowledge it has not so far been studied for its function or localization in the brain.
In addition to the genes most strongly differentially expressed in our model testing the effect of the species: time interaction, we found four genes that significantly differed across species in multiple brain regions (Fig. 6eh). Among these were Nfkbia, which encodes an inhibitor of the transcription factor nuclear factor kappa B [68]; Pxn, which acts as an adaptor protein that interacts with the cytoskeleton and signaling molecules that regulate cell adhesion to the extracellular matrix [69]; the circadian clock gene Per1 [43]; and Cables1, which encodes a protein that promotes neurite outgrowth in neurons and plays a role in regulating cell death [70]. Interestingly, though the absolute level of expression varied across regions within a species, the pattern of expression of these genes was largely similar across our focal regions. This suggests that whatever function they serve in response to mating is consistent across the brain regions. Further, these genes tended to increase after mating in meadow voles, then return to baseline, while maintaining relatively more consistent expression across all time points in prairie voles. This suggests that they are responsive to mating in the promiscuous species but are not necessarily important for bonding.
Alternatively, it is possible that the changes in expression over time we observe in some or all of these genes are related to time itself, rather than time relative to mating. This is particularly true for circadian-related genes Per1, which has opposing patterns of expression in prairie and meadow voles (Fig. 6g), and Ciart, though this caveat applies more broadly, as nearly half of protein-coding mammalian genes show circadian variation in their expression [71]. If this is the case, their variation may reflect differences in circadian cycles of gene expression across species. Prairie voles show ultradian patterns in cardiovascular measures [72] and in activity in the field [73]. This is also true of common voles (Microtus arvalis) [74], though daily rhythms of meadow voles and potential differences between prairie and meadow voles remain to be investigated. Given that in this study we collected prairie and meadow voles at similar time points, our results are unlikely to be an artifact of timing of tissue collection. Moreover, none of our gene ontology analyses identified enrichment of circadian-related terms. While some genes (e.g. Per1, Ciart) may be influenced by species differences in circadian rhythms, on balance, species differences in circadian function do not seem to be major predictors of gene expression in our data. Future investigations can more directly test for circadian differences between prairie and meadow voles and their effects on brain gene expression and bonding behavior.
In addition to these genes that were differentially expressed across species, we also examined modules that were significantly correlated with mating status to identify enriched gene categories associated with these modules. The cyan and pink modules were each positively correlated with having mated, and enriched terms in these modules included many related to translation, ribosomes, and mitochondria. Terms related to catabolic and metabolic processes were also enriched in the pink module. The terms enriched in these modules are likely associated with transcriptional and energetic demands associated with the response to a novel social encounter and initial mating activity. For example, an earlier study has found that 30 min of mating elicits activity-dependent expression of the immediate early genes Fos and Egr-1 in dopaminergic neurons of the AMY in male prairie voles [75]. Consistent with this interpretation, the post-mating time point these modules were most strongly related to was 0.5 h (Fig. 7a). Cyan was significantly positively correlated with this point (r = 0.33, p = 0.024) while pink trended toward a significant positive correlation (r = 0.25, p = 0.088).
Similar to the modules positively associated with mating, the modules negatively correlated with mating (and so positively correlated with virgin status) were most negatively associated with the 0.5 h post-mating onset time point (Fig. 7a). The blue module was significantly negatively correlated with this time (r = − 0.32, p = 0.031) and salmon module trended toward a negative correlation (r = − 0.26, p = 0.073). Many of the terms enriched in the blue module were related to chromosomes and chromatin modifications, suggesting that activity of these genes is reduced in early mating. In the salmon module, most enriched terms were associated with neural development and axon guidance. As the salmon module was positively correlated with HT, this result suggests that during early mating there is a reduction in the formation or growth of axons from neurons in this region. Together, these results suggest that during very early mating in prairie voles, expression of genes involved in synaptic plasticity and activity of epigenetic modifications are actually reduced compared to the virgin state, perhaps due to the energetic and protein synthesis demands occurring at this time. Though it remains possible that the products of these genes are still present and active during early mating.

Limitations and future studies
One limitation of the current study is the fact that annotations for the genes encoding OT, OT receptor, and the V1a AVP receptor are absent from the currently available annotated prairie vole genome. While their absence is unfortunate, as these genes would have served as useful positive controls, given the fact that they have been a major focus of studies of mating and pair bonding in voles and other taxa [5,9,17,56,[76][77][78][79][80][81][82], it is unlikely that the current study would reveal any novel function for these genes. So, although their inclusion in the annotated genome would have been useful, their absence does not impede the key goals of this study.
While this study identified several gene categories that differ between virgin prairie and meadow voles and across regions in both species, as well as intriguing candidate genes that vary over time after mating across species, we are not able to tie these genes and GO categories to specific behaviors. Future studies that include more detailed behavioral observation will be useful for correlating expression of genes of interest to specific mating-related or other prosocial behaviors that voles engage in during bond formation. Additional manipulative studies will be useful in determining which gene expression differences between prairie and meadow voles.
Additionally, our GO term enrichment analyses comparing across species often found significant enrichment of terms related to transcription and translation as well as metabolic activity. While this may be indicative of real differences in these processes between prairie and meadow voles, it is important to consider that this may instead be an artifact of the sequencing and analysis process. That is, due to the enrichment of genes in other categories (e.g. cell structure and neuronal development) in prairie voles, the relative abundance of "housekeeping" genes involved in protein synthesis and metabolism appears to be reduced. Nevertheless, our results still emphasize the importance of brain transcriptional regulation in supporting prairie vole bond formation.
Finally, this study investigated gene expression differences in males only. A benefit of this approach is reducing the number of animals needed to obtain a sample size with appropriate statistical power, while limiting the number of samples and factors needed to analyze a large dataset. Further, most of the neural mechanisms of prairie vole pair bonding investigated so far are shared across the sexes [17]; however, sex differences in the neurohormonal control of pair bonding do exist, such as dimorphic effects of stress on bond formation [83]. Future studies using RNA-sequencing or a candidate gene approach should investigate whether our results also hold true for female voles, or if there are important sex differences in brain gene expression facilitating bond formation.

Conclusions
Our results emphasize the importance of pre-mating differences in gene expression that confer the ability to pair bond in prairie voles but not in non-bonding species such as meadow voles. In addition, they support the hypothesis that pair-bond formation relies on transcriptional regulation and alterations of neuronal structure in regions including the amygdala, hypothalamus, ventral pallidum, and nucleus accumbens that links neural encoding of partner cues with the reward system, resulting in reinforcement of the partner that leads to selective affiliation (Fig. 8). Finally, we identify several intriguing candidate genes that may play important roles in bond formation following co-habitation and mating. Together, our results should help broaden the scope of research on the neural and molecular mechanisms of pair-bond formation by providing novel candidates to investigate in future manipulative studies.

Animals
All animals used in this study were taken from laboratory breeding colonies at Emory University, derived from field-captured voles collected in Illinois. Prairie voles and meadow voles were housed separately in same-sex groups of two to three voles per cage from postnatal day 21. Housing consisted of a ventilated 36 cm × 18 cm × 19 cm plexiglass cage filled with Bed-o'-Cobs laboratory animal bedding (The Andersons Inc., Maumee, Ohio) under a 14/10 h light/dark cycle (lights on 7:00 AM-9:00 PM) at 22°C with access to food (rabbit diet; LabDiet, St. Louis, Missouri) and water ad libitum. All experiments were performed in accordance with relevant guidelines and regulations and were approved by the Emory University Institutional Animal Care and Use Committee.

Tissue collection
All animals were 60-90 days old and sexually naïve at the time of the experiment and tissues were collected from males only. Voles were euthanized by isoflurane inhalation overdose followed immediately by rapid decapitation as recommended by the American Veterinary Medical Association and approved by the Emory Institutional Animal Care and Use Committee. Tissues for the virgin group (time point 0) were collected from sexually naïve males without exposure to females (n = 4 per species). For tissues from mated animals, single male voles were paired with a single unrelated female of the same Fig. 8 Our results support the hypotheses that baseline species differences prior to mating allow for bond formation in prairie but not meadow voles, and that differential changes in gene expression in regions with roles in processing social context, social memory, and reward following mating support bond formation in prairie voles. We found many genes were differentially expressed across species prior to mating, and many differentially expressed genes across focal regions. Additional genes were differentially expressed across species over time following mating. Gene categories related to cell structure and extracellular space were enriched (red up arrows) in prairie voles relative to meadow voles prior to mating, and categories related to cell structure and neuronal development were enriched after mating. Gene categories related to ribosome & translation and mitochondria & metabolism were enriched in meadow voles (blue down arrows) relative to prairie voles both before and after mating species primed with estradiol benzoate (Sigma Aldrich, St. Louis, MO, USA, BP958) for two days prior to pairing to ensure sexual receptivity. Pairs were observed and the time of first intromission was recorded. Males were collected at 30 min (n = 4 per species), 2 h (n = 4 per species), and 12 h (n = 4 per species) following the first intromission. To reduce the possible influence of circadian effects on gene expression, prairie and meadow voles were collected at similar times for each post-mating time point. Virgin animals were collected at the same times as postmating animals. Following collection, animals were euthanized using isoflurane, and whole brains were harvested and dissected on a block of dry ice. The AMY, HT, and VP/NAc were collected from each brain and stored in RNAlater (Applied Biosystems AM7020).

RNA extraction and sequencing
Total RNA was extracted from each region from all individuals using TRIzol (Sigma-Aldrich) following manufacturer's protocol. An additional DNAse treatment (Purelink) was included to eliminate any contaminating DNA. RNA quantity and quality were assessed using a BioAnalyzer (Agilent) and quantified by spectrophotometry. cDNA was prepared using Superscript reverse transcriptase II (Invitrogen) and purified with QIAquick PCR purification kit. After checking cDNA quantity and quality in the BioAnalyzer, libraries were prepared using Illumina's TruSeq Sample Prep Kit with starting amount of 1.25 μg cDNA. Libraries were sequenced as 50 bp single-end reads at the Emory University genomics core facility using Illumina HiScan, and HiSeq 2000 machines. Read files were trimmed and cleaned using fastq tools.

Differential gene expression
Reads were aligned to an annotated prairie vole genome (MicOch1.0, INSDC Assembly GCA_000317375.1, Full genebuild annotation by Ensembl released February 2017) using STAR aligner (v2.7.2b) [84]. The R package array-QualityMetrics [85] was used to assess sample quality and identify outliers. One sample was removed for having low read counts and two removed as outliers resulting in 47 prairie vole (n = 16 AMY samples, n = 15 HT samples, n = 16 VP/NAc samples) and 46 meadow vole (n = 16 AMY samples, n = 15 HT samples, n = 15 VP/NAc samples) samples. Genes with less than one count per sample on average were removed prior to analyses.
The dataset was characterized and differential gene expression was analyzed with the R package DESeq2 [34] following the workflow provided by the package authors [86]. For analyses of the full dataset, a DESeq dataset was created with the model~species + region + time. Poisson distance was calculated using the PoiClaClu package [33]. Variance stabilizing transformation and principal component analysis (PCA) were conducted using DESeq2 functions. A likelihood ratio test with the reduced model~species + time was used to test for genes that were differentially expressed across regions. To identify genes that were differentially expressed across species before mating, the full model~species + region was applied to count data from both species from the virgin time point only, and the likelihood ratio test was used with the reduced model~region. To test for differentially expressed genes across species at the virgin time point within each region, the data were subset by region and a likelihood ratio test was used with the full model~species and reduced model~1. To test for species differences in gene expression within a region following mating, the data were subset by region and a likelihood ratio test was used with the full model~species + time + species:time and reduced model~species + time. This model tests whether species differ in gene expression at any time point after time 0 (virgin animals) [86]. Additional annotation information for genes of interest (Tables 1, 2 and 3) was obtained from the Uni-ProtKB (uniprot.org) manually annotated, reviewed (Swiss-prot) database. Manually annotated gene IDs and descriptions were derived from mouse (Mus musculus) annotations.

Gene correlation networks
We used the R package WGCNA [45] to identify networks of genes with correlated expression. Count data were divided by species, and separate networks were created for prairie voles and meadow voles. Lowly expressed genes (mean of less than 10 counts per sample) were removed and count data were transformed using DEseq2 variance-stabilizing transformation [34]. We identified signed modules (i.e. modules consist of genes with strong positive correlation to one another) of correlated genes within networks for each species and found correlations between module eigengenes and sample traits including brain region, mating status (coded 1 for any post-mating time point, 0 for virgin), time of collection (0 [virgin],0.5, 2, or 12 h), and individual postmating collection time points. Following this, we compared species networks by calculating module membership score (kME) [45] for each gene in each module for each species, then merged the kME datasets. We used R package pheatmap [87] to create heatmaps based on module distances and Pearson correlations and used hierarchical clustering by the function hclust with method "average" to visualize relationships of modules across species.

Gene ontology term enrichment
To test for enrichment of GO terms among genes associated with our differential expression comparisons or WGCNA modules of interest, we used the Mann-Whitney U (MWU) test [36] implemented by the GO_ MWU R script [37] (Available at https://github.com/ z0on/GO_MWU). Rather than test for enrichment among genes determined to be significant by an FDR cutoff, the MWU test determines if GO categories are enriched by up-or down-regulated genes, or within modules, compared to all other categories using continuous measures. For tests of enrichment among differential expression comparisons, we used signed -log 10 of the p-value as the measure of comparison to identify terms associated with genes most strongly differentially expressed across species or brain regions [37]. The -log 10 of the p-value from the model was calculated for each gene, then "signed" by multiplying by − 1 if the log 2 fold change (LFC) for the contrast of interest was negative [37]. For tests of enrichment in WGCNA modules of interest, we used gene kME values for genes included in the module, and all genes not included in the module were assigned a value of zero and adjusted p-value was determined by 100 permutations where significance measures are randomly shuffled among genes [37].

Data analysis and visualization
All analyses were conducted using R (v4.0.0). Details of packages and scripts used for differential expression, gene correlation network, and GO term enrichment analyses are described in the relevant sections above. Plots were made using DESeq2 [34] and WGCNA [45] as well as the packages ggplot2 [88], pheatmap [87], and RCo-lorBrewer [89].