Comparative analysis of neural transcriptomes and functional implication of unannotated intronic expression
© Sun et al; licensee BioMed Central Ltd. 2011
Received: 13 April 2011
Accepted: 10 October 2011
Published: 10 October 2011
The transcriptome and its regulation bridge the genome and the phenome. Recent RNA-seq studies unveiled complex transcriptomes with previously unknown transcripts and functions. To investigate the characteristics of neural transcriptomes and possible functions of previously unknown transcripts, we analyzed and compared nine recent RNA-seq datasets corresponding to tissues/organs ranging from stem cell, embryonic brain cortex to adult whole brain.
We found that the neural and stem cell transcriptomes share global similarity in both gene and chromosomal expression, but are quite different from those of liver or muscle. We also found an unusually high level of unannotated expression in mouse embryonic brains. The intronic unannotated expression was found to be strongly associated with genes annotated for neurogenesis, axon guidance, negative regulation of transcription, and neural transmission. These functions are the hallmarks of the late embryonic stage cortex, and crucial for synaptogenesis and neural circuit formation.
Our results revealed unique global and local landscapes of neural transcriptomes. It also suggested potential functional roles for previously unknown transcripts actively expressed in the developing brain cortex. Our findings provide new insights into potentially novel genes, gene functions and regulatory mechanisms in early brain development.
Keywordsneural transcriptomes stem cell intronic expression embryonic brain cortex neonatal brain cortex
It is well known that total gene numbers are similar among multicellular eukaryotes, and genome size does not correlate with organism complexity, which differs greatly in terms of development, physiology and behavior among eukaryotes . The transcriptome and its regulation contribute significantly to eukaryotic diversity in the aforementioned complexity. The Functional Annotation of the Transcriptome of Mammalian Genome (FANTOM) projects (FANTOM 1-4) have demonstrated the complexity of transcriptomes in several aspects, including non-coding RNAs , antisense transcription [2, 3], regulated retrotransposon expression , and alternative promoter usage, splicing and polyadenylation .
Recent high-throughput RNA-seq  technologies have provided unprecedented capability to analyze cellular, tissue-specific, or organismal gene activities across a broad spectrum. It also revealed the transcriptomic complexity during cell differentiation [7, 8] and organ development . Furthermore, individuals of the same species have transcriptomic differences such as expression variation among humans . Another level of transcriptomic complexities has been revealed by extensive analysis of novel splicing variants from known exons [7–11]. In addition, thousands of transcripts from previously unannotated (non-exonic) genomic regions have been reported [7, 8, 10–13]; they are either named TUF (Transcripts of Unknown Function)  or unannotated TAR (Transcriptionally Active Region) . Some of the unannotated TARs are large intergenic noncoding RNAs that function in embryonic stem cell pluripotency and cell proliferation [16, 17], while most unannotated TARs have no known function.
It has been reported that undifferentiated human stem cells have elevated expression of unannotated TARs compared with differentiated neural progenitor cells . Our recent study has also detected additional transcripts from intergenic regions and introns in mouse embryonic and neonatal brain cortices . Mammalian neural development is a complex process involving cell division, cell differentiation, cell migration, axon guidance, synaptogenesis, and synaptic plasticity. The characterization of stage specific unannotated TARs during early brain development could provide clues regarding the roles these unannotated TARs might play in determining neural fate and in regulating neuronal functions.
To further investigate the transcriptome dynamics and to better understand the possible roles of unannotated TARs in early neural development, we have analyzed the RNA-seq datasets from embryonic and postnatal mouse brain cortices that we generated recently , as well as seven additional RNA-seq datasets covering both neural and non-neural tissues [7, 18]. These nine transcriptome datasets include data from human embryonic stem cell (hESC) and its subsequently differentiated forms (N1, early initiation; N2, neural progenitor; and N3, early glial-like cell) , embryonic day 18 (E18) and postnatal day 7 (P7) mouse brain cortices , and adult mouse brain (AMB), liver (AML), and muscle (AMM) .
Through a systematic analysis of these nine datasets, we found several unique characteristics of the transcriptomes in early neural development. We found that, although the genome was not as pervasively transcribed as previously reported , most of the genomic regions at 1 Mb resolution had detectable RNA-seq signals. We also found that the transcriptomes from neural tissues possessed several genome-wide characteristics resembling those of stem cells. Interestingly, the E18 cortex shows the highest level of unannotated transcript expression compared to P7 and adult brains. Furthermore, the intronic unannotated transcripts are associated with GO terms for neurogenesis, neural signaling and negative regulation. Importantly, few of the unannotated TARs in E18 and P7 cortices are connected with known transcripts, suggesting potential novel functions of these TARs during brain development.
Results and discussion
Mapping RNA-seq data from mouse developing brains and other organs
RNA-seq mapping result using TopHat.
human embryonic stem cell
early hESC initiation
human neural progenitor
human early glial-like
mouse brain cortex at E18
mouse brain cortex at P7
SRR037165 to SRR037170
SRR037193 to SRR037198
SRR037199 to SRR037205
SRR037220 to SRR037226
SRR001356 SRR001357 SRR006488 and SRR006489
SRR001358 SRR001359 SRR006490 and SRR006491
SRR001361 SRR001362 and SRR006492
35 bp PE and 35 bp SE
36 bp PE and 36 bp SE
33 bp SE
Original Read Count
14.4 M PE and 4.4 M SE
15.8 M PE and 7.0 M SE
19.6 M PE and 11.6 M SE
22.4 M PE and 3.0 M PE
10.0 M PE and 3.0 M SE
10.4 M PE and 3.6 M SE
89.0 M SE
75.9 M SE
59.9 M SE
Original RNA-seq data size (Mbp)
Mapped by Tophat* (Mbp)
Percentage Mapped (%)
Most 1-Mb genomic regions have transcriptional activity with uneven distribution at a finer scale
Transcriptomic comparison between neural tissues and other tissues
To understand the neural transcriptome characteristics, we compared mouse cortical RNA-seq data at E18 and P7 stages with available adult mouse brain, liver, and muscle RNA-seq data , as well as RNA-seq data from human embryonic stem cells (hESC) and neural cells (N1, N2, N3) immediately differentiated from hESC . We first analyzed transcriptome properties at the chromosome level, using a method slightly modified from Mortazavi et al. as detailed in the Methods section, and labeled as RPKM* (similar to RPKM; formula (1)).
In addition to the above-mentioned mapping of the 5 mouse datasets (E18 and P7 cortices, and adult brain, liver and muscle), we also mapped the 4 human RNA-seq datasets (hESC, N1, N2 and N3) onto the mouse reference genome (mm9 in the UCSC database; ). Based on the 85% identity calculated from coding regions between mouse and human genome previously , there would be on average 5 mismatches per 35 bp RNA-seq read length. We found that the threshold of 2 mismatches per 35 bp read achieved the best balance between specificity and sensitivity for this cross-species mapping. Increasing the number of allowed mismatches resulted in fewer uniquely and correctly mapped reads, while decreasing this number resulted in fewer total mappable reads. With the threshold of allowing maximum 2 mismatches for RNA-seq mapping, this would mean little cross-species mappable reads if the differences between coding regions of human and mouse were distributed evenly. Surprisingly, on average 6% of the total human RNA-seq reads could be mapped to the mouse genome, or 11.5% of the reads mappable to the human genome. The majority (80%) of the reads mapped to the mouse genome were also mapped to the human genome.
As an alternative way to compare the mouse and human data, we also mapped the human data onto the human reference genome (hg19; ), for comparative analysis using previously identified syntenic/orthologous genomic regions between mouse and human [21, 23]. These studies defined 217 conserved syntenic blocks between the human and mouse genomes. Chromosomal expression profiles in early-differentiated human neural cells were very similar to that of human embryonic stem cells (Additional file 1, Fig. S3C). Even with different chromosome numbers and organizations, neural chromosomal expression profiles were also very similar between human neural cells and mouse neural tissue samples between syntenic/orthologous genomic regions (Additional file 1, Fig. S3B, C). For example, the most highly expressed chromosome in mouse was chromosome 11, whose human counterpart is chromosome 17, which was the second most highly expressed chromosome in human. The most highly expressed chromosome in human was chromosome 19, whose mouse counterparts are distributed on chromosomes 7, 8 and 19, among which chromosomes 7 and 19 were also highly expressed in neural tissues.
To assess the variation in expression levels between chromosomes for different tissues/organs, we calculated the standard deviation for the distribution of individual chromosome expression level, or RPKM* values, for each mapped transcriptome dataset (Additional file 1, Fig. S3D). We found that the standard deviation for E18 was the lowest among mouse samples, while the standard deviation for the stem cells was the lowest in human samples (Additional file 1, Fig. S3D). These results indicated that the mouse E18 brain cortex and human embryonic stem cells use chromosomes more evenly than other organs/tissues.
It is well known that the brain has a very high metabolic rate, consuming a significant amount of energy while lacking substantial energy reserve tissues. Thus normal brain functions depend on mitochondria as the crucial energy provider. To examine the mitochondrial genome expression-level changes across different developmental stages, we plotted the normalized mitochondrial expression level, measured in RPKM*, across all nine datasets and normalized against the dataset size. We found that in human datasets, compared with stem cells, differentiated neural cells had a higher level of mitochondrial expression (Additional file 1, Fig. S3E), increasing from the N1 to N2 stages, then maintaining a similarly high level at the N3 stage. Similarly in the mouse brain, mitochondrial expression progressively increased from the E18 embryonic stage to the P7 neonatal stage, then to the adult stage. The adult mouse brain had a similar level of mitochondrial expression to that of the adult liver, while the adult mouse muscle had the highest level of mitochondrial expression among all analyzed organs/tissues, consistent with high energy demand for muscle contraction.
We then further analyzed the correlation of the expression in two different tissues/stages among co-expressed genes between the tissues/stages (Figure 5B and Additional file 1, Fig. S6). Among all mouse samples, although E18 was the one with the highest correlation with hESC, E18 was still more similar to mouse neural transcriptomes in terms of expression level correlation. In particular, E18 and P7 transcriptomes were much more correlated with each other than with hESC, suggesting that the similarity between E18 or P7 cortex and hESC is relatively limited.
Expression characteristics of genes associated with neurodevelopmental disorders
Among genes associated with ASD, RGS4, DTNBP1, NLGN2, STX7, MECP2, ARVCF, and PPP3CC all showed high-level expression from the embryonic to adult brain. One important finding was that while both NLGN1 and NLGN2 showed high-level expression at the E18 to P7 stages, consisting with their synaptogenic functions, NLGN1 expression was significantly reduced in the adult brain, suggesting that the relevant function might be fulfilled by other cell adhesion molecules. This is also consistent with the current understanding that many cell adhesion molecules can trigger glutamatergic synapse formation as NLGN1 does, but only NLGN2 is capable of inducing GABAergic synaptogenesis [35, 36]. One surprising finding is that DISC1, a well-studied gene associated with schizophrenia , showed very low expression at the E18/P7 stages and still low in adult brain. However, DISC1 was modestly expressed in hESCs and the expression decreased after neural differentiation, suggesting that DISC1 might play an important role in stem cell functions.
Detection of unusually high levels of unannotated transcript expression level at E18 and P7 stages
Concordant changes in expression levels between intronic TARs and flanking exons
Intronic and exonic expression level changes for genes with intronic TARs from E18 to P7.
Also With Increased Intronic Expression
Also With Decreased Intronic Expression
Number of Genes With Increased Exonic Expression
Number of Genes With Decreased Exonic Expression
Few unannotated TARs were connected with known exons
The strong concordant correlation between the previously unannotated intronic TARs and flanking exons suggested that the intronic TAR and its flanking exon might be parts of the same RNA transcript. To test this hypothesis, we focused on the E18 and P7 datasets, which had the largest percentage of filtered unannotated TARs. A paired-end read with one end located in the unannotated TAR and the other in a known exon would be strong evidence that this intronic TAR and the known exon are parts of the same mRNA. However, it is in principle possible that the mapping positions could be erroneous. In addition, the existing mathematical and statistical models for determining the connection between TARs  are designed for RNA-seq data from cDNAs generated with random primers. They are not applicable to poly-dT primed data, which have a 3' bias. So we first devised a model suitable for both priming techniques (Methods; formula (2)), which reports the presence of the physical connection between expressed TARs and known exons. Using known adjacent exons and single exon genes (SGEs) with detected reads as positive and negative controls in a simulation, the formula had success rates of 93% and 100%, respectively.
Physical connection between unannotated TARs and known exons.
E18 Brain Cortex
P7 Brain Cortex
Connected with Known Exons
Not Connected with Known Exons
Non-Standalone ( Same Chromosome)
Other ( Non-Standalone Multi-Chromosome)
Comparing the intronic TARs with known mRNA and EST in NCBI databases
To test whether there is other evidence for the intronic TARs, we searched the data in NCBI's cDNA/mRNA and EST databases. Among 554 intronic TARs detected at E18 stage, 176 (32%) had no matches in NCBI databases. Similarly, among 168 intronic TARs at P7 stage, 49 (29%) had no matches in NCBI databases. Therefore, our results provide the first evidence for these TARs being expressed. Among the matching NCBI database entries, 11 (2%) of the 378 for the E18 stage and 7 (4%) of the 119 for the P7 stage were from the same stages, but none of them was from the brain cortex.
We then examined the splicing pattern of the mRNA and EST records matched to our detected intronic TARs and found two classes of intronic TARs: (1) with records suggesting that the TARs were standalone, without connection to known exons; (2) with some records suggesting that the TARs were standalone while other records suggesting that they were connected to known exons. 304 out of 378 (80%) intronic TARs at E18 and 75 out of 119 (63%) intronic TARs at P7 belonged to the first class. For the second class of intronic TARs, on average, the ratios for records supporting standalone transcripts to those for connections to known exons were 4.2 and 2.8 for the E18 and P7 stages, respectively. Taking together, the comparison with NCBI's cDNA/mRNA and EST databases strongly suggested that most of our detected intronic TARs were not connected with known exons and thus were novel transcripts.
Comparing the intronic TARs with known records in miRbase and lncRNAdb
We then compared our intronic TARs in miRNA database miRbase  and long non-coding RNA database lncRNAdb . Although we found no significant hits in these two databases for any intronic TARs observed at P7 stage, we did find 12 and 6 hits for intronic TARs at E18 stage in miRbase and lncRNAdb, respectively (Additional file 2, Table S5 and S6). However, all 6 intronic TARs with hits in lncRNAdb were mapped to the same lncRNA, B2 SINE RNA, which was from a SINE repeat element. In addition, 11 of the 12 intronic TARs having hits in miRbase mapped to the same miR-1935 miRNA, and the remaining one mapped to miR-153-2. Otherwise, we did not detect significant hits for other types of RNAs.
Sequence conservation and coding potential of intronic TARs
To obtain clues about possible function of the intronic TARs using sequence similarity to other mammalian genes, we investigated whether unannotated TARs corresponded to any highly conserved region using the 30-Way Multiz Alignment & Conservation track in UCSC Genome Browser . We found that there were 554 and 168 unannotated TARs at E18 and P7 stages, respectively; among these, 67 in E18 and 21 in P7 matched regions of highly conserved sequences. For example, a TAR on chromosome 15 (102324092-102324772) was localized to an intron of the mouse PCBP2 gene encoding the major cellular poly(rC)-binding protein . In addition, there were RNA-seq reads spanning this intronic TAR and the upstream exon (Figure 8B, red reads), indicating that this previously unannoted TAR was spliced with a known exon. Moreover, this TAR had a significant overlap with a highly conserved region located in the 3' most intron, which was identified by mammalian conservation study using 30-Way Multiz Alignment & Conservation track data (Figure 8C and 8D) . An Open Reading Frame (ORF) was also predicted inside this TAR and was conserved among the PCBP2 genes of human and dog (99% similar in amino acid sequences; Figure 7E), but not opossum. PCR and ABI 3730 resequencing results further verified that this TAR is indeed part of an mRNA (Figure 8F) with a connection between this TAR and the upstream exon, consistent with RNA-seq results. However, PCR product between this intronic TAR and the downstream exon was not detected, in agreement with the RNA-seq results. This TAR was very likely to represent an alternative 3' UTR with a potential coding region.
In addition, an intronic TAR with an ORF inside the ATP2B1 gene located on chromosome 10 (98481907-98482067) shares 99.3% identity to the 20th exon of human ATP2B1 isoform a (ATP2B1a) (Additional file 1, Fig. S4). Human ATP2B1 has two splicing variants: ATB2B1a and ATB2B1b, which differ in the usage of the 20th exon. Previous studies showed that ATP2B1a has a specific expression at synapses whereas ATB2B1b is expressed in most tissues [45, 46]. Thus this TAR is likely to encode a neuron-specific exon of the mouse ATP2B1 gene. We also found another expressed region on chromosome 7 (112781296-112781396) that shares 87.5% identity with a part of the second 3' UTR exon of the human Trim3 gene (Additional file 1, Fig. S4). Trim3 (or BERP) is expressed in the brain and encodes a RING finger protein that regulates GABAR cell surface expression . Another intronic TAR located in the NRXN1 gene on chromosome 17 (90854147-90854636) has the potential for coding Neurexin 1, a neuronal cell adhesion molecule interacting with neuroligins to promote synapse formation and maturation . The ORFs in these intronic TARs were highly similar to parts of human ATP2B1, BERP and NRXN1 genes, respectively. A number of other intronic TARs, such as those in CHD3, TSC22, and SRCAP, were either similar to known human exons or supported by mouse gene predictions and mRNA and/or EST data in the NCBI database.
Three other intronic TARs were located, respectively, in the Zeb2 gene on chromosome 2 (44953049-44955802), the Ntrk3 gene on chromosome 7 (85484006-85485464), and the Odz2 gene on chromosome 11 (36491704-36492013), within introns that are more than 10 kb long. These TARs did not match mRNA or EST records in the NCBI database, nor were they similar to protein sequences in the NCBI database. Nevertheless, these three TARs were conserved in rat, human, dog and opossum genomes, matching annotated introns in the orthologous genes in human and rat (Additional file 1, Fig. S5). Our RNA-seq data did not detect physical connection between the TARs and known exons; the lack of connection between the TARs and the flanking exons were further supported by the observations that PCR was successful when both primers were located inside a particular intronic TAR, but not able to generate products when a primer in the intronic TAR region was combined with another primer in one of the flanking exons (Figure 8F). As a control, the correct PCR product was obtained using primers matching the two flanking exons of the given intronic TAR (Figure 8F). Therefore, these three intronic TARs were most likely standalone transcripts that were not linked with the flanking exons.
Genes with intronic TARs were over-represented in GO terms closely associated with neural development
For neural signaling related GO terms at E18, two subgroups form largely parallel interactions: the first subgroup mainly functions in regulating neural system process, while the second subgroup carries out signal transmission (Figure 9). The terminal node was the regulation of synaptic transmission, which combines the aforementioned two functions. Among the nodes with strong statistical support were "transmission of nerve impulse", "neurological system process", and "synaptic transmission". Another interesting aspect is that the regulatory relations in the first subgroup were positive, while they were negative in the second subgroup. For neural signaling related GO terms at P7, however, only one function group, similar to the aforementioned 2nd subgroup, was identified. It had 3 nodes and the regulatory relations between the nodes were negative (Figure 9).
Nearly half (28, all at E18) of the enriched GO terms were for neurogenesis, with extensive interconnections between nodes and no obvious functional subgroups (Figure 10). The functions of these nodes include many aspects of neural development, such as cell morphogenesis, neurogenesis and neuron differentiation, eventually diverging into two termini: (1) axongenesis and axon guidance and (2) dendrite morphogenesis. It is also striking that all regulatory relationships were negative.
As shown in Figure 10, the third major group of overrepresented GO terms at E18 was for regulation. They were mainly about negative regulations, consistent with the negative regulatory relations identified between the majority of the nodes for neural signaling and neurogenesis groups. Specifically, strong statistical support was found for negative regulation of "metabolic process", of "gene expression" and of "biosynthetic process", ending in that of RNA polymerase II-dependent transcription. For P7, only one node received strong statistical support: regulation of molecular function (Figure 11).
Exonic and intronic RNA-seq read number comparison between E18 and P7 for exon guidance related genes.
Number of detected reads in E18
Number of detected reads in P7
Read Ratio (Intron/Exon)
Intronic Read Ratio (E18/P7)
Exonic Read Ratio
Read Ratio (Intron/Exon)
Intronic Read Ratio (E18/P7)
Exonic Read Ratio
Neural and stem cell transcriptome
In this study, we have investigated global characteristics of embryonic and neonatal neural transcriptomes, and compared with transcriptomes of the adult brain and embryonic stem cells. We found that embryonic and neonatal brain cortex transcriptomes correspond to most genomic regions at large scale of megabase intervals, but are unevenly distributed with positive correlation to exon density. In addition, neural transcriptomes are similar to that of embryonic stem cells, more than those of liver and muscle, in several features including chromosome level expression (Additional file 1, Fig. S3A and B), and expression pattern of orthologous genes (Figure 5). Also, the E18 brain cortex transcriptome and hESC transcriptome showed relatively even chromosomal distribution and had lower mitochondrial expression.
Other than these global similarities, we noted another shared characteristic between neural expressed genes and genes important for pluripotent stem cells. Specifically, three genes, Sox2, Myc and Klf4 were detectable in all six neural samples (Figure 6), with high levels in E18 or P7 transcriptomes. The expression of these genes suggests that neural cells might need fewer factors to be converted to stem cells. Indeed Kim et al found that only two factors (Oct4 with either Klf4 or c-Myc), instead of four, were needed to revert neural stem cell to iPS cells . Therefore, the similarity in transcriptome, including the expression of specific genes, such as Sox2, Myc and Klf4, between neural cells and stem cells suggests that neural cells might retain certain stem cell properties and have greater potential to be reprogrammed to be pluripotent.
Intronic TARs as standalone RNA regulators in early brain development
The mapping results of our transcriptome datasets revealed significant levels of intronic reads. We found that only a small portion of the intronic transcripts that we detected was on the same RNAs with any known exons. Recently, Klevebring et al. have reported that about 50% of the intronic expression was actually from the antisense strand , different from the sense exon-containing mRNAs of the same gene. Thus the intronic TARs detected here share some characteristic with the antisense transcripts, although our data lacked the strand information. Our finding that the level of intronic transcript is positively correlated with that of flanking exons is consistent with previous studies that antisense transcription may have both concordant and discordant regulation relative to the adjacent exons . Furthermore, Faghihi et al. also reported regulation involving antisense RNA not mediated by the conventional RNA interference pathway , indicating additional mechanisms are important. Our results that the E18 brain cortex has significantly higher levels of intronic transcripts than other tissues/organs strongly suggest that such non-coding transcripts play important roles in regulating gene expression during embryonic brain development.
We also found that the mouse E18 embryonic brain had a concordant relation between intronic transcript and flanking exonic expression. This is unlike previous studies showing preferential localization of antisense transcripts in the upstream and downstream regions of the gene [53–55]. Our data have further indicated that the E18 embryonic brain showed enrichment of genes with intronic TARs in GO categories that are closely associated with neural functions. The E18 cortical neurons are actively engaged in neurogenesis, including axonogenesis and synaptogenesis. For the significant GO terms associated with neurogenesis, all the regulatory relations between nodes were negative (Figure 10). Moreover, an entire group of significant nodes was about negative regulation (Figure 11). However, at P7, intronic TARs were no longer associated with either neurogenesis or negative regulation. These findings suggest the involvement of intronic TARs in stage-specific regulation of neural developmental.
Recently, a subset of long ncRNAs was found to have an enhancer-like function . Our data also indicated a correlation between the change in intronic transcript expression level and the change in the expression level of the corresponding gene. For example, the increased intronic expression is correlated with increased exonic expression for 10 axon guidance associated genes, whereas such correlation was not found for a control set of 3 housekeeping genes. The positive correlation in expression change between intronic TARs and the flanking exons further supports the idea that they have regulatory interactions, although it is formally possible that the intronic transcripts have functions unrelated to the genes represented by the flanking exons.
Our transcriptome analyses have revealed possible important mechanisms for gene function and its regulation in the developing brain, and uncovered a strong similarity to stem cells. These results provide a number of novel insights regarding neural developmental gene functions that can be further investigated using molecular genetic, biochemical and electrophysiological experiments.
RNA-seq data for hESC, N1, N2 and N3 were obtained from NCBI Sequence Read Archive SRP002079. RNA-seq data for adult mouse brain, liver and muscle tissues were obtained from NCBI Sequence Read Archive SRA001030. RNA-seq data for mouse embryonic day 18 and postnatal day 7 brain cortices were the same as described previously . Its NCBI Sequence Read Archive accession number is SRP007262. The protocol for dissection of the mouse cortex was approved by IACUC committee of Pennsylvania State University and in accordance with the US Federal guidelines. All quality scores were then transformed into FASTQ ASCII code by original quality score plus 64. TopHat was selected for mapping these RNA-seq data. SRP002079 data were mapped onto human genome (UCSC hg19, NCBI Build 37), and the rest were mapped onto mouse genome (UCSC mm9, NCBI Build 37), both with the following parameters: --solexa-qual, -g 1. The same parameters were used when we mapped all nine datasets onto mouse reference genome. Although TopHat was instructed to report only unique hit (-g 1), it sometimes could not fully suppress multiple hits (personal communication with Cole Trapnell, TopHat author, on Feb 22nd 2011). Results were then further screened against RepeatMasker  database of the corresponding species to further eliminate possible ambiguous hits.
Normalizing against data size and chromosome size
Data size normalization was done against mappable data size instead of original data size generated from sequencer. This was to accommodate systematic sequence quality and mapping percentage differences from different datasets.
Unannotated transcriptionally active region (TAR) calling
After the RNA-seq data were mapped to the target genome, regions with continuous read coverage that were within close proximity to each other were then chained together, thus forming the transcriptionally active regions (TARs). Only TARs longer than 100 bp and with more than 5X coverage were considered. These TARs were then compared with UCSC Known Gene . TARs that did not overlap with UCSC Known Gene annotation features were then compared with known tRNA annotation  and custom-complied rRNA annotation. To further eliminate possible false-positives from repeat, TARs that were not included in any of the above annotations were then mapped back to genome with BLAST . All regions with significant hits elsewhere in the genome were discarded. The remaining unannotated TARs were then filtered by their distance to their nearest exons. All unannotated TARs that were too close to known exons or genes were discarded as these may originated from previously reported small exon variations .
RT-PCR validation for intronic TARs and the connection between intronic TARs and flanking exons
Sample preparation and RNA extraction were done according to the procedures described previously . For the RT-PCR experiment, total RNA was isolated from mice E18 and P7 cortical tissues by using Ambion RNAqueous-Midi Total RNA Isolation Kit (Catalog#1911). One microgram of RNA was reverse transcribed into cDNA by using Biolabs DyNAmo cDNA Synthesis Kit (Catalog# F-470L).
To validate the expression of several pluripotency-related genes: approximately 1/20 of the first strand cDNAs was used as a template for PCR with gene-specific primers. PCR was carried out for 25 cycles of 94°C for 20s, 54°C for 30s, and 74°C for 40s. 10 ul of PCR products was separated on 0.8% (w/v) agarose gels containing ethidium bromide and visualized by UV light. A secondary PCR was performed for P7 with same primers by using 1 ul first round PCR products, for 32 cycles.
To validate the expression of specific intronic TARs, primer sequences were chosen within the intronic TAR, between the intronic TAR and the upstream exon, between the intronic TAR and the downstream exon, and between the upstream exon and the down stream exon. RT-PCR was carried out using the cDNAs as template with Taq polymerase for 22 cycles (add the temperature info). PCR product was sequenced at the Genomics Core Facility at Penn State using an ABI 3730 machine.
Models to determine the physical connection of unannotated TARs with known exons/transcript for both poly-dT and random primed RNA-seq data
The general question can be abstracted to how to determine whether a given detected transciptionally active region (TAR) was on the same RNA with other exons/transcripts using paired-end information, i.e., there was a physical connection between the given TAR and another exon. If a given unannotated TAR is long and has many internal RNA-seq reads, its number of paired-end reads with one end located at a known exon should mean differently if a given unannotated TAR is short and with comparably less internal RNA-seq reads. We thus propose that the support for the aforementioned physical connection between the unannotated TAR and a known exon should be evaluated as a function of the length of the given unannotated TAR, the coverage (number of internal RNA-seq reads) of the unannoated TAR and the number of paried-end RNA-seq reads linking this unannotated TAR and another known exon. RNA-seq mappers also tend to have a lower mapping capability if they need to map a partial read at the end of an exon or TAR.
Ni, two times the total number of paired-end reads with both ends located inside the given region (to reflect each end in a pair).
Ls, read length for one end of a paired-end read.
Lc, clone length of a paired-end read, which is 2×Ls plus the insert size.
Lr, length of the given TAR.
M, maximum number of allowed mismatches of the mapping algorithm.
T, a correction factor. Splice junction spanning reads will have two partial matches to two discrete genomic regions. This value represents the success rate of the algorithm in mapping partial reads to the end of a given region, normally between 0 and 1. 0 means the algorithm cannot map partial reads to the end of a given region. 1 means the algorithm can map 100% of the partial reads to the end of a given region. Given the fact TopHat is designed to do RNA-seq mapping, the T value we picked was 0.99.
If data indicated that there was a significant amount of links from the given TAR to both upstream and downstream exons, the Ne should be doubled since there should be reads covering both ends of the given TAR. If the size of the given region Lr was smaller than that of the clone length Lc, then by theory all paired-end reads from this TAR should be reads linking this given TAR with other region(s).
If the actual number of paired-end reads connecting a given TAR with other regions was significantly less than (in this case we used 20%) Ne, the given TAR is thought to be a standalone transcript. Otherwise, this region was inferred as non-standalone, which means some level of splicing activity. More specifically, if a significant portion of the aforementioned paired-end reads had the other ends located in annotated exon(s), this given TAR was thought to be part of a known transcript. However, if the aforementioned reads were connecting more than one chromosome, then this TAR was thought to be multi-chromosome linked.
Testing model effectiveness
To test the effectiveness of the proposed formula (formula (2)) in determining the connection between a given unannotated TAR and known exon(s), we must have positive controls that are known to be detectable in our dataset and are also known to be on the same RNAs with known exons. An expressed exon from a multi-exon gene would meet this requirement and should be able to serve as our positive control. To ensure that these exons were truly connected to known exons by RNA-seq reads, these exons were selected by hand through manual inspection of the RNA-seq mapping results using Integrative Genomics Viewer (IGV, http://www.broadinstitute.org/igv). For each selected exon, we made sure that there were multiple reads spanning the selected exon with at least one other known exon. To ensure a true representation of the genome wide situation from our test data, these selected exons were picked from different chromosomes, with different RNA-seq read coverage, different locations within a given gene and different relative distances to the 3' end (Additional file 2, Table S3). The proposed formula (formula (2)) was able to identify such exons as being physically connected with known exons with a success rate of 93%.
We also performed a negative control test to determine the effectiveness of the proposed formula in determining whether a given transcript has a physical connection with any known exon(s). Single exon genes (SEGs) in mouse genome were selected as the negative controls since they are known to be a standalone transcript. We first identified a list of SEGs which had RNA-seq reads in our dataset (Additional file 2, Table S4). The proposed formula was able to identify these selected SEGs as not being physically connected with any known exon(s) at a success rate of 100% (14% of the selected SEGs were determined, however, as multi-chromosome linked).
Gene Ontology (GO) analysis
Reference mouse GO annotation was obtained from the Jackson Laboratory's MGI site (http://www.informatics.jax.org). Expressed genes were inferred from RNA-seq mapping results mapped to UCSC Known Gene. Expressed genes were then compared with reference mouse GO annotation. Identifier conversion between the UCSC Known Gene and the GO annotation was done using in-house script. Among all GO terms, only Biological Process GO terms were analyzed. We first calculated the number of genes mapped to a given GO term. For a gene with multiple GO terms, all terms were considered because one gene may be involved in multiple biological processes. If one GO term node was counted, all its parental nodes were excluded. Four sets of GO annotation were produced using the aforementioned procedure: all expressed genes in E18, all expressed genes in P7, genes with intronic TAR(s) in E18 and genes with intronic TAR(s) in P7.
For a given stage, GO annotation for the entire transcriptome and GO annotation for only genes containing intronic TARs were compared using agriGO server . The statistical significance was determined by Fisher's Exact Test, with Bonferroni Correction. The p-value threshold was preset at 0.05 and only GO terms with more than 5 hits were reported.
Human samples: [SRA: SRP002079], adult mouse samples: [SRA: SRA001030], brain cortices samples: [SRA: SRP007262].
List of abbreviations
transcriptionally active region
human embryonic stem cell
early initiation stage of hESC
neural progenitor derived from hESC
early glial-like cell derived from hESC
embryonic day 18 brain cortex
postnatal day 7 brain cortex
adult mouse brain
adult mouse muscle
adult mouse liver
reads per kilobase of exon(s) per million mapped reads
We thank X. Han and X. Zhou for helpful discussion and comments on this manuscript. We specially thank Prof. Naomi Altman for her input on statistical analysis on comparison between chromosome expression profiles. This work was supported by the Biology Department, Eberly College of Sciences, and the Huck Institutes of the Life Sciences, the Pennsylvania State University. G.C. was supported by an NIH grant (MH083911). Y.S. was supported by the Intercollege Graduate Program in Genetics and the Biology Department, the Pennsylvania State University. H.M. was also supported by Fudan University.
- Thomas CA: The genetic organization of chromosomes. Annu Rev Genet. 1971, 5: 237-256. 10.1146/annurev.ge.05.120171.001321.View ArticlePubMedGoogle Scholar
- Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, et al: Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002, 420: 563-573. 10.1038/nature01266.View ArticlePubMedGoogle Scholar
- Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, et al: Antisense transcription in the mammalian transcriptome. Science. 2005, 309: 1564-1566.View ArticlePubMedGoogle Scholar
- Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, Schroder K, Cloonan N, Steptoe AL, Lassmann T, et al: The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. 2009, 41: 563-571. 10.1038/ng.368.View ArticlePubMedGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563.View ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu JQ, Habegger L, Noisa P, Szekely A, Qiu C, Hutchison S, Raha D, Egholm M, Lin H, Weissman S, et al: Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc Natl Acad Sci USA. 2010, 107: 5254-5259. 10.1073/pnas.0914114107.View ArticlePubMedPubMed CentralGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28: 511-515. 10.1038/nbt.1621.View ArticlePubMedPubMed CentralGoogle Scholar
- Han X, Wu X, Chung WY, Li T, Nekrutenko A, Altman NS, Chen G, Ma H: Transcriptome of embryonic and neonatal mouse cortex by high-throughput RNA sequencing. Proc Natl Acad Sci USA. 2009, 106: 12741-12746. 10.1073/pnas.0902417106.View ArticlePubMedPubMed CentralGoogle Scholar
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464: 768-772. 10.1038/nature08872.View ArticlePubMedPubMed CentralGoogle Scholar
- Klevebring D, Bjursell M, Emanuelsson O, Lundeberg J: In-depth transcriptome analysis reveals novel TARs and prevalent antisense transcription in human cell lines. PLoS One. 2010, 5: e9762-10.1371/journal.pone.0009762.View ArticlePubMedPubMed CentralGoogle Scholar
- van Bakel H, Nislow C, Blencowe BJ, Hughes TR: Most "dark matter" transcripts are associated with known genes. PLoS Biol. 2010, 8: e1000371-10.1371/journal.pbio.1000371.View ArticlePubMedPubMed CentralGoogle Scholar
- Mangone M, Manoharan AP, Thierry-Mieg D, Thierry-Mieg J, Han T, Mackowiak S, Mis E, Zegar C, Gutwein MR, Khivansara V, et al: The Landscape of C. elegans 3'UTRs. Science. 2010Google Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, et al: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.View ArticlePubMedGoogle Scholar
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, et al: Global identification of human transcribed sequences with genome tiling arrays. Science. 2004, 306: 2242-2246. 10.1126/science.1103388.View ArticlePubMedGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458: 223-227. 10.1038/nature07672.View ArticlePubMedPubMed CentralGoogle Scholar
- Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K, Presser A, Bernstein BE, van Oudenaarden A, et al: Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA. 2009, 106: 11667-11672. 10.1073/pnas.0904715106.View ArticlePubMedPubMed CentralGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447: 799-816. 10.1038/nature05874.View ArticlePubMedGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.View ArticlePubMedPubMed CentralGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticlePubMedGoogle Scholar
- Nenadic O, Greenacre M: Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. Journal of Statistical Software. 2007, 20:Google Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticlePubMedGoogle Scholar
- Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol. 2006, 411: 134-193.View ArticlePubMedGoogle Scholar
- Takahashi K, Yamanaka S: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006, 126: 663-676. 10.1016/j.cell.2006.07.024.View ArticlePubMedGoogle Scholar
- Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced pluripotent stem cells. Nature. 2007, 448: 313-317. 10.1038/nature05934.View ArticlePubMedGoogle Scholar
- Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S: Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007, 131: 861-872. 10.1016/j.cell.2007.11.019.View ArticlePubMedGoogle Scholar
- Aoi T, Yae K, Nakagawa M, Ichisaka T, Okita K, Takahashi K, Chiba T, Yamanaka S: Generation of pluripotent stem cells from adult mouse liver and stomach cells. Science. 2008, 321: 699-702. 10.1126/science.1154884.View ArticlePubMedGoogle Scholar
- Marchetto MC, Carromeu C, Acab A, Yu D, Yeo GW, Mu Y, Chen G, Gage FH, Muotri AR: A model for neural development and treatment of Rett syndrome using human induced pluripotent stem cells. Cell. 2010, 143: 527-539. 10.1016/j.cell.2010.10.016.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S, Nie J, Jonsdottir GA, Ruotti V, Stewart R, et al: Induced pluripotent stem cell lines derived from human somatic cells. Science. 2007, 318: 1917-1920. 10.1126/science.1151526.View ArticlePubMedGoogle Scholar
- Kuhlbrodt K, Herbarth B, Sock E, Enderich J, Hermans-Borgmeyer I, Wegner M: Cooperative function of POU proteins and SOX proteins in glial cells. J Biol Chem. 1998, 273: 16050-16057. 10.1074/jbc.273.26.16050.View ArticlePubMedGoogle Scholar
- Geschwind DH, Levitt P: Autism spectrum disorders: developmental disconnection syndromes. Curr Opin Neurobiol. 2007, 17: 103-111. 10.1016/j.conb.2007.01.009.View ArticlePubMedGoogle Scholar
- Polleux F, Lauder JM: Toward a developmental neurobiology of autism. Ment Retard Dev Disabil Res Rev. 2004, 10: 303-317. 10.1002/mrdd.20044.View ArticlePubMedGoogle Scholar
- Paysan J, Fritschy JM: GABAA-receptor subtypes in developing brain. Actors or spectators?. Perspect Dev Neurobiol. 1998, 5: 179-192.PubMedGoogle Scholar
- Graf ER, Zhang X, Jin SX, Linhoff MW, Craig AM: Neurexins induce differentiation of GABA and glutamate postsynaptic specializations via neuroligins. Cell. 2004, 119: 1013-1026. 10.1016/j.cell.2004.11.035.View ArticlePubMedPubMed CentralGoogle Scholar
- Dong N, Qi J, Chen G: Molecular reconstitution of functional GABAergic synapses with expression of neuroligin-2 and GABAA receptors. Mol Cell Neurosci. 2007, 35: 14-23. 10.1016/j.mcn.2007.01.013.View ArticlePubMedGoogle Scholar
- Jaaro-Peled H, Hayashi-Takagi A, Seshadri S, Kamiya A, Brandon NJ, Sawa A: Neurodevelopmental mechanisms of schizophrenia: understanding disturbed postnatal brain maturation through neuregulin-1-ErbB4 and DISC1. Trends Neurosci. 2009, 32: 485-495. 10.1016/j.tins.2009.05.007.View ArticlePubMedPubMed CentralGoogle Scholar
- Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, Faulkner GJ, Lassmann T, Forrest AR, Grimmond SM, Schroder K, et al: Tiny RNAs associated with transcription start sites in animals. Nat Genet. 2009, 41: 572-578. 10.1038/ng.312.View ArticlePubMedGoogle Scholar
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, et al: RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007, 316: 1484-1488. 10.1126/science.1138341.View ArticlePubMedGoogle Scholar
- Fejes-Toth K, Sotirova V, Sachidanandam R, Assaf G, Hannon GJ, Kapranov P, Foissac S, Willingham AT, Duttagupta R, Dumais E, et al: Post-transcriptional processing generates a diversity of 5'-modified long and short RNAs. Nature. 2009, 457: 1028-1032. 10.1038/nature07759.View ArticlePubMed CentralGoogle Scholar
- Li H, Wang J, Mor G, Sklar J: A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science. 2008, 321: 1357-1361. 10.1126/science.1156725.View ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008, 36: D154-158. 10.1093/nar/gkn221.View ArticlePubMedGoogle Scholar
- Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS: lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 2011, 39: D146-151. 10.1093/nar/gkq1138.View ArticlePubMedGoogle Scholar
- Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, et al: The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010, 38: D613-619. 10.1093/nar/gkp939.View ArticlePubMedGoogle Scholar
- Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, Nativ N, Bahir I, Doniger T, Krug H, et al: GeneCards Version 3: the human gene integrator. Database (Oxford). 2010, 2010: baq020-View ArticleGoogle Scholar
- Kenyon KA, Bushong EA, Mauer AS, Strehler EE, Weinberg RJ, Burette AC: Cellular and subcellular localization of the neuron-specific plasma membrane calcium ATPase PMCA1a in the rat brain. J Comp Neurol. 2010, 518: spc1-View ArticleGoogle Scholar
- Cheung CC, Yang C, Berger T, Zaugg K, Reilly P, Elia AJ, Wakeham A, You-Ten A, Chang N, Li L, et al: Identification of BERP (brain-expressed RING finger protein) as a p53 target gene that modulates seizure susceptibility through interacting with GABA(A) receptors. Proc Natl Acad Sci USA. 2010, 107: 11883-11888. 10.1073/pnas.1006529107.View ArticlePubMedPubMed CentralGoogle Scholar
- Missler M, Sudhof TC: Neurexins: three genes and 1001 products. Trends Genet. 1998, 14: 20-26. 10.1016/S0168-9525(97)01324-3.View ArticlePubMedGoogle Scholar
- Du Z, Zhou X, Ling Y, Zhang Z, Su Z: agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 2010, 38 (Suppl): W64-70.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38: D355-360. 10.1093/nar/gkp896.View ArticlePubMedGoogle Scholar
- Kim JB, Zaehres H, Wu G, Gentile L, Ko K, Sebastiano V, Arauzo-Bravo MJ, Ruau D, Han DW, Zenke M, et al: Pluripotent stem cells induced from adult neural stem cells by reprogramming with two factors. Nature. 2008, 454: 646-650. 10.1038/nature07061.View ArticlePubMedGoogle Scholar
- Faghihi MA, Wahlestedt C: RNA interference is not involved in natural antisense mediated regulation of gene expression in mammals. Genome Biol. 2006, 7: R38-10.1186/gb-2006-7-5-r38.View ArticlePubMedPubMed CentralGoogle Scholar
- Sun M, Hurst LD, Carmichael GG, Chen J: Evidence for a preferential targeting of 3'-UTRs by cis-encoded natural antisense transcripts. Nucleic Acids Res. 2005, 33: 5533-5543. 10.1093/nar/gki852.View ArticlePubMedPubMed CentralGoogle Scholar
- Core LJ, Waterfall JJ, Lis JT: Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008, 322: 1845-1848. 10.1126/science.1162228.View ArticlePubMedPubMed CentralGoogle Scholar
- Faghihi MA, Wahlestedt C: Regulatory roles of natural antisense transcripts. Nat Rev Mol Cell Biol. 2009, 10: 637-643. 10.1038/nrm2738.View ArticlePubMedPubMed CentralGoogle Scholar
- Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M, Notredame C, Huang Q, et al: Long noncoding RNAs with enhancer-like function in human cells. Cell. 2010, 143: 46-58. 10.1016/j.cell.2010.09.001.View ArticlePubMedPubMed CentralGoogle Scholar
- Smit A, Hubley R, Green P: RepeatMasker Open-3.0. 2010, [http://www.repeatmasker.org]Google Scholar
- Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D: The UCSC Known Genes. Bioinformatics. 2006, 22: 1036-1046. 10.1093/bioinformatics/btl048.View ArticlePubMedGoogle Scholar
- Chan PP, Lowe TM: GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009, 37: D93-97. 10.1093/nar/gkn787.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.View ArticlePubMedPubMed CentralGoogle Scholar
- Chern TM, van Nimwegen E, Kai C, Kawai J, Carninci P, Hayashizaki Y, Zavolan M: A simple physical model predicts small exon length variations. PLoS Genet. 2006, 2: e45-10.1371/journal.pgen.0020045.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.