Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain
© Seemann et al.; licensee BioMed Central Ltd. 2012
Received: 9 September 2011
Accepted: 31 May 2012
Published: 31 May 2012
Skip to main content
© Seemann et al.; licensee BioMed Central Ltd. 2012
Received: 9 September 2011
Accepted: 31 May 2012
Published: 31 May 2012
Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts.
By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3’ UTR structures and correlated expression patterns.
Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function.
In neurons, RNA molecules often have to travel long distances between transcriptional origin (nucleus) and functional destination (axon, synapses, dendrites). Dendrites contain thousands of postsynaptic sites and long-lasting forms of activity-dependent synaptic modifications (memory storage) are believed to require local protein synthesis. Local protein translation implies that mRNAs are transported from the nucleus and localized to dendrites and synapses [1, 2]. It has been speculated that RNA secondary structures in mRNA untranslated regions (UTRs) are involved in these processes [3, 4]. In addition, numerous noncoding RNAs (ncRNAs) are expressed in brain [5, 6] and mounting evidence indicates important contributions of ncRNAs in brain functions such as memory formation and maintenance [7, 8], as well as a host of other functions in mammalian cells. This study further explores these connections by combining the large scale in situ hybridization data of the Allen Mouse Brain Atlas (http://mouse.brain-map.org)  with in silico predictions of conserved RNA secondary structure, revealing extensive enrichment of such structures in the adult mouse brain transcriptome.
Post-transcriptional regulation of RNA splicing, editing, transport, stability, localization and translation through UTR signals plays an important role in controlling gene expression. Important examples of stable RNA secondary structures are known in both 5’ UTRs  and 3’ UTRs. For instance, the 84-nucleotide (nt) long structure-anchored repression element (CAESAR) in CTGF is highly conserved in structure but not in sequence, and is suspected to inhibit translation and affect mRNA stability . Other structural mRNA elements, such as the selenocysteine insertion sequence (SECIS) element and nanos 3’ UTR TCE, are targets of RNA binding proteins. Stem-loop structures in untranslated regions are sometimes critical for proper mRNA localization [12–14], such as translocation of the MAPT mRNA along axonal microtubules  and ASH1 mRNA to the cortical actin cytoskeleton . RNA binding proteins might localize the RAB1A mRNA to specific cytoplasmic regions through recognition of its highly conserved 3’ UTR sequence and structure, so that translation would occur close to the location of the respective protein regulating intracellular vesicle transport . A predicted stable RNA structure overlaps the RNA localization region in the 3’ UTR of the mRNA encoding myelin basic protein MBP. The structure (but not the sequence) is conserved in human, mouse and rat . The highest affinity site of the RNA-binding protein Qk1 is located within the RNA localization region of MBP, suggesting a possible role for Qk1 in restricting MBP mRNA to the myelin membrane . In a very distinct manner, many 3’ UTRs in mouse are reported to be expressed separately from their mRNAs in a developmentally regulated manner , and some reported regulatory mutations in 3’ UTRs do not appear to act in cis to regulate the expression of the associated mRNA. Some structured 3’ UTRs may, thus, act in trans as ncRNAs .
Long noncoding RNAs (lncRNAs) have recently received increased attention due to their functional diversity in basic molecular and cellular biology [21–24]. In particular, they appear to be deeply entwined with cellular regulatory machinery, both as targets of important transcription factors , and as direct cis- and trans-regulators of gene expression through interactions with transcription factors or as indirect regulators through an RNA-binding protein intermediate (transcription factor co-regulators) . Furthermore, they have demonstrated roles in regulation of dosage compensation, imprinting, chromatin state, and epigenetic inheritance by DNA methylation . A hallmark of many small ncRNAs is the critical role of RNA secondary (and tertiary) structure. RNA structure also may have major functional implications for lncRNAs as shown, e.g., for the noncoding co-factor MEG3 of the tumor repressor p53, and the p53 regulated transcriptional repressor lincRNA-p21, which is tethered to hnRNP-K for its proper localization .
Several genome-scale screens for stable, conserved RNA secondary structures have found known RNA families and many potentially novel ncRNAs ( , , , ), albeit with significant false discovery rates [33, 34]. Classical transfer RNAs, ribosomal RNAs, some microRNAs and many other functional ncRNAs have a weakly conserved sequence, and instead, have a highly conserved functional secondary structure. Hence, comparative analyses that focus on sequence conservation and ignore potential conservation of secondary structure underestimate ncRNA prevalence. Here, we apply to search for RNA secondary structures. The method attempts to create structurally optimal alignments from unaligned orthologous input sequences using an expectation-maximization algorithm. Both thermodynamic energies and evidence for conservation of secondary structure, e.g. presence of compensatory mutations in putative helices, are part of the evaluation criteria. An appropriate background model distinguishes between significant RNA structures, e.g. putative ncRNAs, and structured background.
The key question addressed in this paper is the extent to which RNA structures, both in noncoding transcripts and the UTRs of protein coding transcripts, play biologically important roles in the brain. We address this question by analyzing transcripts expressed in the adult mouse brain as cataloged in the Allen Mouse Brain Atlas (Atlas) for their potential to contain RNA structures predicted by . There are Atlas probes for approximately 20,000 RNA transcripts in the adult mouse brain, visualized at cellular resolution by in situ hybridization (ISH) . Of these transcripts, 16,900 exhibit cellular expression above background in the adult mouse brain . Expression data within the ISH images is identified and mapped to defined regions . This mapped expression data can be used to examine global and spatial expression patterns and to find genes with similar spatial expression profiles. Although the majority of Atlas transcripts represent protein coding genes, Mercer et al. identified well over 1,000 Atlas riboprobes as putative lncRNAs and affirmed the expression patterns of some previously described lncRNAs, such as Evf Gtl2 Gomafu, and Sox2ot.
Structured Allen Mouse Brain Atlas riboprobes
Expressed structured ncRNAs
Expressed structured UTRs
Expressed Atlas probes
By considering all annotated UTRs of full-length transcripts we find many additional structures, often at the end of longer alternative UTRs . However, the expression of these variants in the brain is unknown, which is why we consider only those portions of UTRs that are overlapped by Atlas probes. For instance, one isoform of mouse VEGF-related factor gene (Vegfb) lacking a 3’UTR is expressed in brain . We predict a RNA structure in one alternative 3’UTR of Vegfb, but we annotate it as a non-structured UTR because the Atlas probe does not overlap the extended 3’ UTR structure.
The significance of the observed patterns has been tested by different statistical methods (see Methods) with similar results. We addressed in which of the 11 neuroanatomical regions the mean expression energy of structured putative ncRNA and structured UTR probes is significantly different from expressed probes, intergenic and intronic, and UTR probes. The most striking result is that in all 11 neuroanatomical regions the 5,126 structured UTR probes have significantly higher expression than 4,467 non-structured UTR probes (see Additional file 2: Figure S2) as well as all expressed probes. The same applies for the finer level of granularity where the 11 neuroanatomical regions are further subdivided into 115 regions. On the other hand, there is significant expression enrichment of structured putative ncRNAs to non-structured ncRNAs only in cerebellum (using ).
Based on these observations we conducted further analyses to gain insight towards the possible causes of the enrichment of transcripts with structured UTRs. We studied significantly (p-value<0.001) enriched gene ontology (GO) terms  of UTR probes using functional annotation by DAVID . We found support for function enrichment of binding (p=5E-40), localization (p=4E-18) and transport (p=4E-16) in structured UTR probes. Several GO terms for protein binding (p=1E-43) and RNA binding (p=7E-7) are significant for probes with structured UTRs (see Additional file 3), but none for non-structured UTRs (see Additional file 4). In addition, we found several GO terms which connect structured UTR probes to intracellular signaling pathways and suggest a directed RNA transport between nucleus and synapses or dendrites, e.g. the cellular components cytoplasm (p=2E-35), nucleus (p=6E-15) and synapse (p=7E-6) and the molecular functions intracellular signaling cascade (p=2E-18), protein transport (p=4E-11), protein localization (p=1E-11), vesicle-mediated transport (p=1E-11) and cytoskeletal protein binding (p=1E-6). For non-structured UTR probes there are four times less enriched GO terms, in general with lower significance than for structured UTR probes. Only the GO terms cytoplasm (p=3E-22) and transport (p=5E-4) are enriched for non-structured UTRs and are related to signaling function. Localization can imply different functional impact, for example direct involvement in transport, but it can also imply translational regulation at a specific subcellular location. Given the anatomy of the neurons where presumably many transcripts are located far away from the nucleus the observation of enriched expression of UTR regions with (predicted) RNA structures is consistent with this.
The expression data also shows slightly higher expression of structured putative ncRNA transcripts than non-structured ncRNAs in many brain domains as indicated in Figure 2 and Additional file 2: Figure S3. Mean expression of the 151 structured ncRNA candidates is larger in 83 out of 115 brain regions. Enrichment is significant in 15 regions (including cerebellum) compared to 3 regions with significantly enriched non-structured ncRNA probes using a more robust measure of location assuming dependency between multiple ISH measurements (; see Methods).
It is essential to determine whether the presence of enriched transcripts is due to slower degradation caused by RNA structures . The delay of degradation of transcripts folding in conserved RNA structures may support an increased half life of brain relevant RNAs. Proteins are actively synthesized in neuronal synapses despite the long distances between nucleus and synapses. For this purpose translational control of gene activity appears to be more efficient than transcriptional control . Conservation of different structures in different transcripts suggests that they are involved in a rich variety of post-transcriptional regulatory interactions, e.g. through altered transcriptional stability. Combined with the previously described GO analyses, this suggest that proteins involved in molecule mobility are produced in larger numbers, and mRNAs and ncRNAs are transported to their intended cell destination before carrying out their function.
As an initial step towards assigning functional information we searched for proteins that may bind to predicted structures in UTR regions. RNA binding proteins are trans-acting factors that function, e.g., in RNA localization. For instance, the mRNA of the neurotrophic tyrosine kinase TrkB receptor is transported to dendrites and translated in response to neural activity. The mouse TrkB 5’ UTR contains one conserved and one mouse-specific single internal ribosomal entry site (IRES) whose RNA secondary structures and sequence-specific motifs are proposed to be integral to IRES-dependent translation . In agreement with this, the prediction finds the conserved IRES structure in 9 mammals, whereas, as expected, the unconserved IRES structure was not predicted. The structure consists of two stems of which the 3’ stem is the same as previously shown in human . Activity of the conserved IRES is enhanced in the presence of the polypyrimidine tract binding protein PTB1. In the ISH data correlated expression of TrkB and PTB1 can be seen, even though at a low level, in the olfactory bulb (ρ=0.49) and medulla (ρ=0.52) using the spatial homology search tool  (see Methods).
In comparison to non-structured UTRs, a correlation-based search for similar expression pairs (using ) results in slightly more correlated expressed pairs between transcripts coding for RNA binding proteins and transcripts with structured UTRs. To identify spatial and brain-wide correlations, we used Pearson’s correlation coefficient greater than a threshold of ρ T =0.9 and ρ T =0.85, respectively (see Methods for the selection criteria of ρ T ’s and spatial expression). We identified spatial correlation between 41 RNA binding proteins annotated in RBPDB  and 66 structured UTR transcripts mostly in thalamus, pallium and hippocampus (see Additional file 2: Table S3), as well as brain-wide correlated expression between 6 RNA binding proteins and 12 structured UTR probes (see Additional file 2: Table S4). We also searched for potential interaction sites of RNA binding proteins around UTR structures which are discussed below.
By examining correlated expression patterns, we can hypothesize new functions for previously uncharacterized structured transcripts or identify potential interacting RNA molecules as well as RNA-protein interactions due to localized translation as described above. The following prediction of an annotated UTR element exemplifies connectivity of functional related molecules. We predict a widely conserved (in 16 organisms from human to zebrafish) 25 nt stem-loop in the 3’ UTR of rat brain-derived neurotrophin factor BDNF. This stem-loop partly overlaps the loop and 5’ end of the annotated core region of an extended stem-loop previously predicted in the full-length UTR structure (by ) . The 3’ UTR structure of BDNF provides a scaffold for interaction of various RNA binding proteins, polyadenylation factors and miRNAs in response to C a2 + signal (neuron activity). The interaction results in C a2 +signal-dependent stabilization of mRNAs in neurons .
Many known ncRNAs exhibit their functionality through binding of RNA target sequences, such as microRNAs bind mRNAs, snoRNAs bind ribosomal and small nuclear RNAs, and certain lncRNAs may bind microRNAs  to regulate their activity or guide RNA editing. Potential RNA-RNA interactions between structured transcripts and correlated expressed RNAs were searched by scanning all putative ncRNAs and UTRs of Atlas transcripts for statistically significant intermolecular RNA binding sites. By combining  and  we calculate the minimum free energy (MFE) of putative interaction sites in the real data, and the same strategy was used to create background distributions on dinucleotide shuffled data for p-value calculation (see Methods).
For 6 putative ncRNAs with local and 2 putative ncRNAs with brain-wide correlated expression we found putative interaction sites to 3’ or 5’ UTR of the correlated mRNAs, however, of relatively large p-values (see Additional file 2: Table S8). For instance, a non-conserved interaction site is predicted between the putative ncRNA TC1462951 and the 3’ UTR of Kcnb1 (see Additional file 2: Figure S8 for ISH image and expression mask). The putative ncRNA LOC433503 may interact with a conserved region in the 3’ UTR of Gpx3, only 100 nt upstream of the common stem-loop structure SECIS (see Additional file 2: Figure S9). In addition, around 600 significant (p-value<1e-05) interaction sites with a MFE smaller than -40 kcal/mol are predicted by and between structured putative ncRNAs and, e.g., UTRs of mRNAs coding for RNA binding proteins (Rbpms and Samd14; see Additional file 5), but the ISH data does not reveal correlated expression.
Microarray studies have shown that at least 50% of assayed transcripts are expressed in the brain , with up to 80% of transcripts shown to be expressed by ISH . In order to gain a better understanding of transcripts in the brain that may be contributing to brain function, we examined which transcripts have an RNA structure. We observed that in silico predicted RNA structures are enriched both in coding (UTR regions) as well as noncoding transcripts in almost all regions of the adult mouse brain. The simplest interpretation of the data is that the Atlas probes showing higher expression are enriched for predicted RNA structures. Through the integration of mouse brain expression data and secondary RNA structure predictions, we found that transcripts with such predictions in their UTRs, those that are enriched in the 3’ UTR adjacent to the ORF, have the highest expression throughout the brain. Many of these mRNAs as well as their protein products may act as signaling molecules whereas the UTR structures serve as binding motifs for other RNAs and proteins involved in intracellular signaling pathways. This hypothesis is supported by (i) enriched gene ontology terms binding, transport and localization, (ii) correlated expression patterns between mRNAs with structured UTRs and RNA binding proteins, and (iii) a larger expression diversity of transcripts with structured UTRs. UTR structures as signal for motor-driven transport and translational repression through RNA binding proteins are especially attractive in neurons where the transport of information stored in ribonucleic sequences from the nucleus through long axons to the synapses is an important component of neuronal functionality .
We investigated this hypothesis further by searching for potential protein binding motifs around (predicted) UTR structures to 72 RNA binding proteins annotated in RBPDB  (see Methods and Additional file 2). The majority (90%) of the UTR structures has at least one predicted binding motif in its neighborhood (see Additional file 2: Table S5). These motifs can be bound by 21 proteins. Only 9 proteins, however, have significantly more predicted targets than expected by chance, and half of the binding proteins are involved in splice site regulation. The analysis indicates that some interesting binding motifs can be found, such as neural-specific Elavl2, cytokine’s degrading Zfp36, and mRNA trafficking Khsrp. Zfp36 binds AU-rich elements (ARE) in the 3’ UTR of some cytokine mRNAs and promotes their degradation. Intriguingly, an AU-rich region (AU content of 85% over a length of 41 nt) starts at the 3’ end of the predicted UTR structure of 6530418L21Rik (see Additional file 2: Figure S10) and its expression is highly correlated with that of another zinc finger protein (Zfp365) and Lancl1, an RNA binding protein involved in immune surveillance of the brain  (see Figure 5). Assuming that 6530418L21Rik works as a signaling molecule, its transport function may be deactivated through the binding of Zfp36 close to its 3’ UTR structure. However, here a large scale investigation in RNA-protein binding is still limited due to the low information content of binding motifs described by short sequence-based position weight matrices (PWMs).
Motivated by the GO analysis we also considered the hypothesis that structured RNAs in neural cells are themselves involved in establishing intracellular signaling pathways. For instance, Dienstbier et al. provide evidence that Egalitarian (EGL) and the dynein cofactor Bicaudal D (BICD), previously known to be required for minus-end-directed mRNA transport, mediate linkage of various mRNAs to the dynein motor in Drosophila melanogaster. Here, we show that EGL nine homolog 1 and BICD have predicted UTR structures, BICD is associated with the GO terms intramolecular, cytoplasm, localization, transport and binding and EGL with the GO term binding. Proteins, such as EGL, BICD and cytoskeletal protein filaments, are needed to establish intracellular pathways for directed cytoplasmic RNA transport towards synapses and dendrites. For signal propagation in the opposite direction back to the nucleus, mRNAs coding for these proteins have to be transported first and, thus, need cis-acting RNA elements too. The hypothesized directed RNA transport is illustrated in Additional file 2: Figure S14.
We also looked for predicted RNA structures in all UCSC and RefSeq annotated UTRs of protein coding genes overlapped by Atlas probes. We found 9,378 of these genes with RNA structure predictions in their UTRs and 5,576 without UTR structures. Of the 4,467 Atlas probes that overlap unstructured UTRs, 1,246 probes have a structure elsewhere in (at least one variant of) the UTR. It is unknown whether these structures are present in brain. Assuming they are, i.e., reclassifying as “structured” some of the UTR probes previously classified as “unstructured”, we see even larger differences between the expression of structured and non-structured UTR probes (see Additional file 2: Figure S13 compared to Figure 2). Hence, we conclude that our overall statistics also hold for RNA structure annotation in full-length transcripts. In addition, we showed that putative ncRNAs with locally predicted RNA structures have significantly higher expression than non-structured intergenic and intronic transcripts in several brain regions. Positive correlated expression patterns between pairs of transcripts are often domain-specific for putative structured ncRNAs. Most promising are 4 ncRNAs with brain-wide correlated expression in small cliques (mCG145872 A230057G18Rik TC1462951, and Raph1; see Additional file 2: Table S6), and several ncRNAs with only one spatially correlated expressed transcript. We investigated conditions where RNA structure has a function, such as RNA-RNA interactions between correlated expressed RNA transcripts. One of the applied methods in this study, e.g., predicts the interaction site of two sequences. However, it is known from RNA motif searches that short sequence motifs can often appear by chance which partly explains the large p-values for the predicted RNA-RNA interactions. Consideration of homologous sequences in other species and duplex folding by using tools such as [58, 59] may help to obtain more significant predictions.
A major uncertainty is the limited resolution of the informatics detection of expression in the ISH images and, thus, the correlation data. Several cells comprise a single voxel leading to interpolation between expression information and noisy expression energy. Sagittal images are more impacted by registration errors since only a single hemisphere is available for registration. The majority of correlation pairs detected in the sagittal plane failed validation by manual inspection of the ISH images (see Methods for further information). The largest cliques of correlated expression are often because of process artifacts in the images or the absence of expression (see Additional file 2: Figure S7). One desirable quality improvement of the correlation data is the weighted consideration of the voxel neighborhood which would improve the confidence in correlated expressed pairs by sacrificing some level of detail. Furthermore, the data might also be interesting for graph theoretical analyses on gene expression correlation networks. Features of these networks are relatively unknown and the correlation coefficient threshold could be more sophistically chosen by analyzing its influence on network connectivity. The large number of 3’ UTR probes might also target ncRNAs, in addition to the untranslated region of mRNAs. In several specific cases we observed highly correlated brain-wide expression, e.g., between the 3’ UTR probe Kcnc2 and its intronic mCG142089, and between Dusp3 and its downstream-sense located structured probe TC1462951, but these probe pairs may have bound the same (pre-spliced) transcript. Thus, conclusions about correlated expression of adjacent or overlapping transcripts are hardly possible, especially if they have widespread expression throughout the brain.
An additional concern is that the observed correlation between structure and expression level might be an artifact of RNA degradation. All exonucleases have problems initiating degradation close to stable stem structures . Hence, the abundant enrichment of transcripts hosting RNA structures may be at least partly explained by their slower degradation and, thus, higher accessibility to riboprobes compared to transcripts lacking RNA structure. In fact, if the structures are involved in translational regulation, reduced degradation is just as effective as increased transcription in terms of raising steady-state transcript levels. Thus, to determine when e.g., a bound protein primarily serves to regulate or primarily serves to prevent degradation seems hard, in particular if preventing degradation is part of the regulatory mechanisms as is the case with the iron metabolism in vertebrates . However, the observed enrichment of transcripts with structured UTRs is not related to a particular structure, hence, it is unlikely that a particular RNA binding protein that promotes transcript stability by binding to a specific structured RNA motif is responsible for the broad expression pattern.
A final concern is that our results might be explained by a difference in the hybridization efficiency of Atlas probes towards structured versus unstructured transcripts. Hybridization is affected by a variety of factors, such as probe accessibility and affinity to the targeted molecule. For short oligos, although there are some contexts in which hybridization may be enhanced by appropriate RNA structures , it is most often suggested that highly structured regions in a target transcript would reduce hybridization efficiency. Many riboswitches, for example, down-regulate translation by sequestering the ribosome binding site in a structure that blocks interaction with the 16S rRNA . This evidence suggests that structured target molecules would generate a decreased signal, but we observed an increase. In addition, Atlas probes were chosen to be 400-1200 bases in length. For such long probes that are perfectly complementary to their targets, the fully hybridized “double helix” will be the most energetically favorable state and seems likely to form easily from a simple initial toe-hold/zipper extension interaction from almost any initial conformation of the target. Thus, on balance, it does not seem likely that riboprobe affinity to structured versus unstructured transcripts explains the observed enrichment of structured transcripts.
Overall, our results show a huge potential for RNA structure as an abundant and active feature on both coding and noncoding transcripts in the adult mouse brain. Using we predicted more than 40,000 RNA structures (mostly in intronic and 3’-untranslated regions) in about 10,500 expressed Atlas probes in the adult mouse brain. Even though in silico methods for RNA structure prediction hold high false positive rates of up to 50% [33, 34] our findings still leave room for functional RNA structures in the Atlas transcriptome data. The significantly enriched expression energy of structured transcripts is hard to explain by chance and supports the theme of functional RNA structures in the mouse brain. In the future, a structure analysis remains to be carried out on a global transcriptome data set in the adult mouse brain because the Atlas data primarily focus on protein-coding transcripts and has limited data on noncoding transcripts.
The Allen Mouse Brain Atlas (Atlas) probes have been previously mapped to the mouse (mm8) genome . Probe coordinates and RNA structure predictions are mapped to UCSC  and RefSeq  gene tracks with at least 10% overlap of probes and predictions. Intergenic and intronic probes are further checked for significant protein-coding potential as performed by Mercer et al.: CRITICA  predicts significant protein-coding potential in the probe sequence or any targeted transcript, and ORFs greater than 120 codons are detected that comprise at least one third of the transcript length. In addition, we applied  on mm8 based UCSC multiz17way alignments of intergenic and intronic probes to also detect shorter conserved ORFs (p-value<0.001).
RNA structure predictions are in general unclear about which strand actually contains the structure . Therefore, strand predictions of RNA structures were not used. We assume that a prediction on one strand yields a candidate on both strands. We mapped structures to Atlas probes if the structure overlaps at least 1nt of an intergenic probe or if the structure overlaps at least 1nt of a UTR exon, coding exon, or intron that was mapped to the Atlas probe. We used this rather conservative procedure instead of mapping to putative respective transcripts of the probes to avoid counting splicing variants with predicted RNA structures. This procedure will miss some structured UTRs, however, our statistical conclusions still hold for the investigated subset of UTR structures.
The Allen Mouse Brain Atlas probes are annotated as known structured RNAs if they overlap at least 10% of a mouse microRNA in miRBase v10.0  or a human track in miRBase, snoRNABase , Rfam 9.1 , ncRNA.org or Jones’ and Eddy’s ncRNA list . We used generated alignments and chained blastz alignments (liftOver tool) to map the human tracks to its mouse homologs.
The expression energy quantifies the overall expression at a given voxel. It is calculated as the product of expression level and density of cells expressed in that voxel . All riboprobes have sagittal expression data and a subset of riboprobes have both sagittal and coronal expression data. Informatics processing of the expression data from the sagittal sectioning plane is, however, effected by the data only containing one hemisphere (coronal data has two hemispheres), various starting and ending positions of the tissue sections processed for an individual riboprobe, and minor variability in the section cutting angle. In contrast, coronal data typically registers better as the symmetry of the section helps to lock the other two dimensions of the 3D grid together. To increase the accuracy of expression profiles and to meet quality control metrics, we created a high quality dataset that includes 1,525 structured UTR probes from Table 1 with coronal image series minus 125 coronal images series having manual detected processing artifacts (such as upside down images), widespread expression or missing image data due to failure of individual tissue sections.
Significant spatial expression patterns are found by two sample location t-tests of the null hypothesis that the expression energy means of two sets of Atlas probes are equal. Errors associated with each ISH measurement are not totally independent from each other, thus, the normal distribution assumption does not hold. We apply bootstrap procedures to estimate the unknown distribution of expression energy in neuroanatomical regions. The percentile-t bootstrap p-values differ from ordinary percentile p-values in that they are based on bootstrap approximations of the distribution of the studentized estimator rather than the distribution of the original estimator. P-values are adjusted by the method of Benjamini & Hochberg to control the false discovery rate and the null hypothesis was rejected if the adjusted p-value < 0.25. As a more robust measurement of location we also calculated adjusted p-values of 0.2% trimmed means using the bootstrap methods and .
The Atlas provides interpolated expression energy in regular 3-dimensional lattices of cellular resolution for each sagittal and coronal image series. The correlation of the expression energy for each probe pair is calculated by the spatial homology search tool . calculates the Pearson’s correlation coefficient ρ between two vectors of two probes that hold the expression energies for all voxels each 200 micron per side in a defined brain region. The cumulative frequency distribution of the number of correlation pairs over ρ follows typically a negative sigmoid curve (see Additional file 2: Figure S11), thus we chose a threshold ρ T close to the right flattened area of the curve for selecting the most promising correlation pairs. Spatial correlations have tendential higher ρ’s than brain-wide correlations due to the lower amount of compared voxels. Hence, we chose ρ T slightly higher to select spatial correlations.
where vd is the number of voxels in one domain and vbis the number of voxels in the entire brain.
UTR structures and their 50 nt flanking regions are searched for potential protein binding motifs using RBPDB . First, position weight matrices (PWMs) from RBPDB were used together with the perl TFBS library  to scan sequences for binding sites to 72 RNA binding proteins with expressed Atlas probes (461 proteins in RBPDB). Second, we sequence aligned (BLAT) our sequences against 1,021 individual RNA sequences from single-sequence experiments excluding consensus (IUPAC) sequences.
We thank Quaid Morris for the helpful discussion about RNA binding proteins and Marcel Dinger for the probe mapping to the mouse genome and the coding potential pipeline. SMS and MJH thank the Allen Institute for Brain Science founders, Paul G. Allen and Jody Allen, for their vision, encouragement, and support. SES and JG were supported by the Lundbeck Foundation, the Danish Council for Independent Research (Technology and Production Sciences), the Danish Council for Strategic Research (Programme Commission on Strategic Growth Technologies), as well as the Danish Center for Scientific Computing. SMS and MJH were supported by the Allen Institute for Brain Science.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.