In this study, we applied systems biology methods to identify modules of co-expressed genes highlighting differences between the hippocampus and amygdala in two mouse inbred strains, A/J and C57BL/6J. Rather than generating a list of differentially expressed genes, this approach reconstructs networks of genes with related expression profiles that are thought to represent biological meaningful correlations. Reconstruction of these modules is performed in an unbiased fashion, independent of strain or tissue origin. These results may provide new insights or complement our existing knowledge on brain region-specific molecular pathways. Since the hippocampus and amygdala are largely conserved between man and mouse, and are thought to play an important role in behavior, emotion and cognition, studying these brain regions may aid our understanding of mechanisms underlying neuropsychiatric traits.
We observed differential expression between A/J and C57BL/6J mouse inbred strains and between the amygdala versus hippocampus brain regions (Figure 3). Further analysis demonstrated that differences in gene expression profiles between brain tissues were independent of strain origin (Figure 2, Additional File 4). While we provide evidence that the observed strain-specific differences are likely the result of hybridization artifacts (i.e. Magenta module), we postulate that networks enriched with genes that are differentially expressed between amygdala and hippocampus (i.e. Pink and Red modules) represent biological relevant molecular pathways. A limitation of our study is that both hippocampus and amygdala are not homogeneous (see for example [37, 38]). Future research could aim to further dissect these regions (e.g. by laser capture) so that more homogenous sub-regions could be studied. Our study provides a more global assessment of differential expression between these large brain structures thereby ignoring the regional substructures within each of the brain regions.
We observed that brain region-specific effects can be detected independent of strain origin. Genetic variation is thought to be a major regulator of gene expression [16, 18, 39]; however, the differential expression profiles between the amygdala and hippocampus as observed in our study seem to represent natural variation driven by tissue-specific biological processes instead. The Red module especially differentiates between brain region samples, without any interference from genetic background (Figure 2c-d). Previous studies investigating differences in brain regions also found that regional expression could be detected independent of strain. Whether genetic or tissue-specific effects have a stronger influence on gene expression is still under debate [40–44]. The results from our study suggest that the magnitude of the effect of strain and brain region is about the same, as module significance and eigengene values are comparable. However, since the module separating strains may not contain actual strain effects but be due to a hybridization artifact, it is likely that the effect of genetic strain differences is smaller in reality than the functional differences between tissues. Interaction effects between strain and brain region of all 13,627 probes were also assessed (see Additional File 7). This analysis revealed only 8 genes (Arsj, Baiap2l1, Fgf10, Myoc, Krt9, Lyd, 4930511J11Rik, Lypd1) of which differential expression between amygdala and hippocampus was dependent on strain. Of these, Fgf10 and 4930511J11Rik belonged to the Red module. These small numbers suggests that interaction effects between strain and brain region are very limited.
The current dataset provides a comparison between two brain region tissues only and is therefore somewhat limited in its perspective. While Red and Pink modules clearly separate amygdala and hippocampus in these two strains, the functions represented in these modules may extend to other (brain) regions as well. When we compared our findings with spinal cord data from the same individuals, only two of the ten modules were preserved in spinal cord (i.e. Magenta and Turquoise). As in the amygdala and hippocampus, the Magenta module again strongly differentiated between strains in spinal cord, suggesting that the SNP artifact in the probe sequences is driving the strain-specific differences throughout different neuronal tissues. The Red and Pink module differentiating between amygdala and hippocampus, however, are not conserved in spinal cord and are therefore more likely to be amygdala or hippocampus specific.
The Magenta module is enriched for genes differentially expressed between strains independent of brain region. The module eigengene of this group of target genes was able to fully distinguish A/J and C57BL/6J samples (Figure 2a-b). Closer examination revealed that a significant number of probe sequences of genes in the Magenta module contain nucleotide variants (SNPs) that differ between these two strains. The Illumina expression arrays used in this study were developed using the C57BL/6J as the reference strain with probes optimally designed for the C57BL/6J genome. If there are SNP variants between strains within a probe sequence it is expected that C57BL/6J hybridization will be more effective. This prediction coincides with the systematically lower expression levels of probes in the Magenta module in strain A/J (Figure 3). When the network is re-constructed after removing genes in the magenta module that were known to contain SNPs, the remaining magenta colored genes still fall into the same cluster. This module also still differentiates between strains. Therefore, it is possible that these probes represent real strain differences. However, resequencing of the probe sequences not containing a known SNP revealed that there are many unknown strain-specific polymorphisms in probe regions that may affect gene expression measurements between strains.
The issue of SNP variants in probe regions causing hybridization artifacts has been described before when short (25-mer) cDNA probes were used [45, 46], and more recently for long (60-mer) oligonucleotide probes . In addition, it was found that mismatches do affect hybridization intensity, depending on the position of the SNP. By studying the enrichment of modules with regard to SNP-containing probes, our module-based analysis was able to detect this technical artifact. Further, we found that significantly increased numbers of SNP variants at the probe regions were also observed at loci with strong cis -effects in an e QTL analysis of the WTCCC heterogeneous stock database.
A standard gene-based analysis could have easily missed this systematic bias. Co-expression modules may represent technical artifacts, tissue contaminations, or other biologically uninteresting perturbations. This is why functional enrichment analyses, module preservation studies across array platforms, and other forms of validation are required to verify that co-expression modules are indeed biologically meaningful. In our data, we found that a functional enrichment analysis for known gene ontologies did not find any significant findings for the Magenta module (which arose due to hybridization artifacts). We conclude that careful analysis of probe regions is warranted in an expression array study, especially when only a limited number of (parental) inbred strains are involved. We expect that our findings also apply to transgenic mouse models in which part of the genomic sequence may still be from a different genetic background.
The hybridization artifact is not present in the modules found to be significantly enriched with genes differentially expressed between the amygdala and hippocampus samples in both strains, i.e. the Red and Pink modules (Figures 2, 3, Additional File 2). Available databases were used to validate our brain region specific results in silico (Allen Brain Atlas; http://www.brain-map.org/). In both modules, only ~10% of the probes disagreed in expression direction of amygdala and hippocampus, thereby largely validating current brain-region specific findings. Previous studies also found some expression differences between these structures, although they are more similar to each other than most brain regions [41, 42]. Functional examination of the genes represented in these two modules was performed using Ingenuity (Ingenuity® Systems, http://www.ingenuity.com). This analysis revealed that the Pink module did not contain any known pathways. The module eigengene of this network of co-expressed genes was also shown to be less discriminative between samples from the different brain areas than the Red module. However, the Red module was shown to be enriched with the categories of Neurological Disease, Genetic Disorder and Psychological Disorder, as well as Behavior and Nervous System Processes.
A major advantage of WGCNA is that its module detection does not make use not make use of any prior gene ontology information. This allows the expression data to speak for themselves without biasing the analysis. But gene ontology information is very valuable for determining what is known about the modules. While some modules may be highly enriched with genes of a given gene ontology there is no perfect agreement between gene ontology information and module membership. As we show, modules may arise due to non-biological variation. But biologically important modules must not always be enriched with known gene ontologies . While the Ingenuity database is very extensive, GO analyses remain limited by incomplete gene ontology annotation. Module membership and network connectivity may point to a relationship between genes and pathways that was hitherto unknown. Figure 5 indicates genes that are known to fall within these categories, as well as genes that are found to be in the same module based on co-expression profiles but have not been implicated in these pathways before. Therefore, this network approach may provide insights into new candidates in already known processes.
The connectivity of the genes in this module is an indication of their interactivity with other genes in the same module. As can be seen in the graphical representation of the network of the Red module (Figure 5), several genes in enriched pathways are highly connected, while others are less related to other genes. For example Tacr1 (tachykinin receptor 1) is a centrally located hub gene in the Red module and appears in all the above-mentioned categories determined by Ingenuity to be significantly enriched. This gene is more highly expressed in the amygdala compared to the hippocampus in our dataset. This receptor for the substance P is located in the central nucleus of the amygdala. Substance P is a neuropeptide related to pain and it has also been implicated in a wide range of behaviors including learning and memory , motivational processes, and anxiety . Other genes in the Red module are also of added value since they may complement and/or interconnect our knowledge of pathways without having a known, large effect on neuropsychiatric traits directly. For example, we observed an enrichment of genes involved in transcriptional processes that distinguished between the amygdala and hippocampus; these genes included Isl1, Fhl2, Tnfrsf25 and Zcchc12.
Other examples in the Red module support previous findings from the literature. For example, genes involved in the canonical pathway 'GABA-ergic signaling' (Slc32a1, Gabrg1 and Gad1) are closely correlated in the Red module, with increased expression levels in the amygdala. In addition, genes in the subcategory 'neurogenesis of nervous system development' (Nrn1, Cpne6, Dlx6, Enc1, Fgf13, Isl1, Nelf, Neurod2, Olfm1, Sema3a, Sema3f, Sema5a, Tgfb2, Wfs1) were also found in the same module. Adult neurogenesis takes place predominantly in the hippocampus (and olfactory bulb)  and most of these genes (except for Dlx6, Hap1, Isl1 and Wfs1) indeed showed an increased expression in the hippocampus .