- Research article
The complex transcriptional landscape of the anucleate human platelet
BMC Genomicsvolume 14, Article number: 1 (2013)
Human blood platelets are essential to maintaining normal hemostasis, and platelet dysfunction often causes bleeding or thrombosis. Estimates of genome-wide platelet RNA expression using microarrays have provided insights to the platelet transcriptome but were limited by the number of known transcripts. The goal of this effort was to deep-sequence RNA from leukocyte-depleted platelets to capture the complex profile of all expressed transcripts.
From each of four healthy individuals we generated long RNA (≥40 nucleotides) profiles from total and ribosomal-RNA depleted RNA preparations, as well as short RNA (<40 nucleotides) profiles. Analysis of ~1 billion reads revealed that coding and non-coding platelet transcripts span a very wide dynamic range (≥16 PCR cycles beyond β-actin), a result we validated through qRT-PCR on many dozens of platelet messenger RNAs. Surprisingly, ribosomal-RNA depletion significantly and adversely affected estimates of the relative abundance of transcripts. Of the known protein-coding loci, ~9,500 are present in human platelets. We observed a strong correlation between mRNAs identified by RNA-seq and microarray for well-expressed mRNAs, but RNASeq identified many more transcripts of lower abundance and permitted discovery of novel transcripts.
Our analyses revealed diverse classes of non-coding RNAs, including: pervasive antisense transcripts to protein-coding loci; numerous, previously unreported and abundant microRNAs; retrotransposons; and thousands of novel un-annotated long and short intronic transcripts, an intriguing finding considering the anucleate nature of platelets. The data are available through a local mirror of the UCSC genome browser and can be accessed at:http://cm.jefferson.edu/platelets_2012/.
Platelets are circulating peripheral blood cells that emerge from the human bone marrow to function as critical components in basic physiological processes such as hemostasis, wound healing, inflammation, angiogenesis and the pathophysiology of tumor metastases. Platelets that exhibit functional extremes convey a commensurate increased risk for bleeding or thrombosis. Notably, the propensity for such extremes has been shown to be heritable[1–3]. Nonetheless, an understanding of the responsible genes and underlying mechanisms remains limited to date. In this regard, genome wide association studies (GWAS) have identified loci associated with platelet number, platelet volume and ex vivo platelet aggregation[4, 5], but the effect sizes have been quite small. Furthermore, most of the identified loci are not in protein-coding genomic regions. Thus, new approaches are needed to query the repertoire of platelet transcripts.
The platelet transcriptome is a reflection of the megakaryocyte RNA content at the time of (pro)-platelet release, subsequent splicing events, selective packaging and platelet RNA stability, and can provide important insights into platelet biology. Platelets are known to contain messenger RNAs (mRNAs) and indeed most studies support a strong correlation between the platelet’s protein-coding transcriptome and its proteome[7, 8]. Platelets also include unspliced pre-mRNAs, rRNAs, tRNAs and microRNAs (miRNAs)[9–11]. Most platelet studies to date have characterized the platelet transcriptome using microarrays and SAGE[12–18]. A recent effort compared human and mouse platelet transcriptomes with the help of deep-sequencing of poly-adenylated, long RNA transcripts.
The emerging important roles that non-coding RNAs (ncRNAs) play in a cell, their interactions with one another and with protein-coding transcripts[20–25], and the speed by which many categories of ncRNAs burst onto the scene suggests that their involvement in biological processes remains largely unexplored. This is particularly true of platelets where an accurate understanding of the transcriptome has both biological (improved understanding of platelet protein translation and the mechanisms of megakaryocyte/platelet gene expression) and clinical (novel biomarkers of disease) relevance.
Because the content and properties of nuclear and cytoplasmic transcripts vary[27–29], the anucleate human platelet represents a unique model for characterizing post-transcriptional gene expression. In light of the above, we deep-sequenced a) a total RNA preparation, b) a ribosomal-RNA depleted RNA preparation, and c) a short RNA preparation for each of the four individuals. Our results have been embedded in a local mirror of the UCSC genome browser and can be examined interactively athttp://cm.jefferson.edu/platelets_2012/.
We carried out transcriptome sequencing of total RNA isolated from leukocyte-depleted platelet (LDP) preparations from four healthy adults (hereafter referred to as 1N1, 2N2, 3N3, 4N4). LDPs were prepared by density centrifugation of citrated whole blood followed by immunodepletion of CD45+ leukocytes. This preparation yielded fewer than 1 leukocyte per 5 million platelets. For each individual, we constructed three libraries: a) long (≥ 40 nucleotides) total RNA, b) long RNA depleted of rRNA, and c) short (< 40 nucleotides) RNA. All sequencing was carried out on an Applied Biosystems/Life Technologies (AB/LT) SOLiD™ system.
Read mapping across the genome
The reads from each of the 12 generated datasets were mapped separately on each chromosome and strand of the human genome (assembly hg19) using the BWA program and the protocol described in Methods. The non-uniform coverage of protein-coding transcripts by next generation sequencing reads has been documented before and was encountered in our analysis as well. Table 1 shows the average numbers of obtained and mapped reads for each of the three library types (long total, long rRNA-depleted, and short RNA). Notably, mitochondrial transcripts represented more than half of the uniquely mapped long reads (58.1% long total, 65.1% long rRNA-depleted, 1.3% short), something also encountered by other unbiased methods such as SAGE.
Estimating the abundance of protein-coding transcripts in platelets
We devised a scheme (see Methods) for estimating the expression levels of protein-coding transcripts from RNA-seq reads. To estimate transcript abundance, we normalized for transcript length and scaled using the expression levels of the β-actin isoform with ENSEMBL identifier ENST00000331789. This scheme was very effective (see below) and provided us the ability to appropriately scale expression within a read-set and to compare expression levels across read-sets. This β-actin transcript was quite abundant in platelets, present at approximately 15.0 ± 1.5 cycles of PCR containing the equivalent of 10 ng of total RNA, and shows the least amount of variation (± ~3%) across the analyzed samples (Additional file1: Table S1). Pairwise comparisons (Pearson correlation) of our mRNA data after normalizing with GAPDH and two additional stable platelet transcripts, PPBP and B2M, revealed data virtually identical to those originally obtained using ACTB. Notably, the isoforms of the housekeeping gene GAPDH, which is often used as an expression normalizer, exhibited a substantial expression variation upon rRNA depletion (-70% to +130% depending on the specific GAPDH transcript that was considered). It will be important for future platelet RNAseq studies with larger numbers of subjects to confirm these observations pertaining to the isoforms of these two commonly utilized platelet “normalizers.”
We used the most abundant isoform among those derived from an individual protein-coding gene to represent the gene. Figure 1 shows the number of protein-coding genes as a function of the level of normalized expression. This approach revealed different estimates of protein-coding genes that are present at a given level of abundance between total and rRNA-depleted RNA preparations. The finding underscores that estimates of expressed genes were more similar amongst different subjects for high abundance genes (leftwards in Figure1), and that there was substantial inter-individual variation in total transcript estimates when considering the less abundant genes (rightwards in Figure1). Protein-coding transcripts for each of the four samples whose expression was supported by the RNA-seq data are shown in Additional file2: Table S2A (total RNA) and Additional file3: Table S2B (rRNA-depleted RNA). It is worth stressing that our normalization scheme enabled us to compare expression levels across all preparations.
RNA-seq vs. qRT-PCR
We sought to determine the correlation between our RNA-seq normalization approach and qRT-PCR on the same RNA samples. We queried 2 collections of genes: 1) 10 transcripts that exhibited a broad range (> 3 orders of magnitude, i.e. more than 10 PCR cycles) of normalized read counts, 6 of which are well-studied in platelets; and, 2) 89 transcripts for GPCR signaling proteins from a commercial platform, 19 of which were detected using both RNA-seq and qRT-PCR. Figure 2 shows a very high correlation (Pearson r-value of 0.7757 at a p-value <0.0001) for transcripts detected by both methodologies, indicating that our approach of estimating transcript abundance from RNA-seq data is accurate over a wide range of transcript expression levels.
RNA-seq vs. microarrays
We also compared protein-coding transcripts from RNA-seq data with previously published microarray datasets of the platelet protein-coding transcriptome[15, 17, 32]. The three microarray datasets exhibited reduced pair-wise correlation with one another, perhaps the result of a dependence on the used platform and differences in the sample sources and preparations (Figure 3A). In contrast, there is a high and significant pair-wise correlation among the RNA-seq datasets (Figure 3B). In light of these observations, it is not surprising that there was less correlation between any RNA-seq and any microarray set (Figure 3A).
Adverse impact of ribosomal-RNA depletion on the estimates of mRNA abundance
Having established the appropriateness of our normalization scheme, we sought to determine the potential impact of the depletion of ribosomal RNAs on the estimate of relative abundance of the various protein-coding transcripts. To this end, we computed the ratios of the normalized abundance of transcripts between the total and the rRNA-depleted RNA preparations. In an effort to be conservative, and based on the data in Figures 1 and2, we only considered protein-coding transcripts with an estimated abundance that was ≥ 2-10 times that of β-actin, and kept only those whose absolute ratio value was ≥ 2× between the two preparations. Unexpectedly, the number of affected protein-coding genes was high, ranging from 745 (sample 4N4) to 2,341 (sample 2N2) genes (Additional file4: Table S3). Considering the stringency of our criteria, the true number of affected protein-coding transcripts is very likely higher. These findings suggest that the ribosomal RNA depletion step adversely and extensively impacts the relative abundance of protein-coding transcripts within a sample and, by extension, the accurate estimate of the transcripts’ expression levels. The situation is further aggravated by the fact that the magnitude of this impact appears to be transcript-dependent and thus non-uniform: as can be seen from Additional file4: Table S3, the ratio of the normalized expression between the total and rRNA-depleted preparations spans a wide spectrum of values in all four samples. Of particular note are the members of the RNA interference pathway DGCR8, DROSHA, XPO5, DICER1, EIF2C1, EIF2C2, EIF2C3, and EIF2C4 (Table 2): all of them exhibited large differences (up to 32-fold) between the total and rRNA-depleted preparations.
Enriched categories of platelet protein-coding transcripts
We generated the intersection of the expressed protein-coding transcripts across all four samples and ranked them according to abundance. We processed separately the total RNA and the rRNA-depleted preparation. Using GORILLA with the four ranked lists corresponding to each of the two preparations we calculated GO term enrichments with an eye towards assessing whether the platelet protein-coding transcriptome was enriched for certain biological characteristics. Figure 4 shows the top-ranking functional annotation clusters for the biological process category, the number of genes sharing each term, and the associated p-value (log10). As expected, biological process terms such as coagulation, platelet degranulation, etc. were over-represented in the platelet transcriptome and both preparations. Additional analyses of GO terms pertaining to cellular compartment and molecular function are shown in Additional file1: Figure S1A-H.
We also queried the presence of miRNAs in the platelet transcriptome. Additional file5: Table S4 shows the complete set of miRNAs whose expression was supported by the RNA-seq data for each of the four samples. The expression data were normalized with the help of SNORD44; SNORD44 was selected because of its abundance and observed general stability across very diverse tissues. The table reveals that the expression for hundreds of miRNAs was ≥ 32 times higher than SNORD44, suggesting that the platelet transcriptome is rich in miRNAs, a finding also reported by Landry et al.. Unique to our analysis is that we distinguish between the two potential products of a miRNA precursor, namely 5p and 3p, and examine each product’s expression separately (Additional file1: Figure S2 explains why this is important).
In light of recent work highlighting the importance of pseudogenes in regulating miRNA-mediated repression of targeted mRNAs[20, 21], we analyzed our sequenced read sets for evidence of pseudogene transcription. To this end, we used the pseudogene definitions contained in Release 63 of the ENSEMBL database: this Release lists 11,983 transcripts corresponding to 11,158 genes. We found pseudogene loci to be highly enriched across all four samples and in both the total and rRNA depleted preparations (see Additional file1: Table S5 for details). Notably, the observed enrichment values mirrored one another across the preparations.
We also focused on the repeat element category of characterized transcripts. In particular, we computed enrichment values for both sense and antisense transcripts for each of the 116 families of elements that are recognized by RepeatMasker and separately for each of the four samples and the three preparations (total and rRNA-depleted long RNA, short RNA) – a total of 12 sets. Additional file1: Table S6 and Additional file1: Table S7 show that several repeat family loci give rise to both long and short platelet RNA transcripts.
Other categories of non-coding RNAs
Recently, a novel class of long ncRNAs, the “long-intergenic non-coding RNAs,” or lincRNAs for short, has received a lot of attention[23, 36]. LincRNAs number over a thousand members, yet with the exception of a handful of reports[37–39] they remain essentially uncharacterized. Our analysis of the sequenced reads did not reveal any enrichment of the corresponding genomic loci.
Novel and uncharacterized intronic transcripts
Our work uncovered extensive evidence for the existence of transcripts that originate in the introns of known protein-coding genes. This is of particular significance considering that platelets lack a nucleus. For such an analysis it is imperative to distinguish bona fide intronic regions from well-characterized transcripts that are known to be co-located with the introns of protein coding genes. We thus worked with unspliced messenger RNA sequences after first having 'subtracted' all sense instances of the following categories of transcripts: protein-coding and non-protein-coding exons; all known repeat elements; rRNAs; snoRNAs; miRNAs; and, lincRNAs. To this end we used the annotations in Release 63 (June 30, 2011) of the ENSEMBL database. We analyzed each of the four samples and three preparations (total and rRNA-depleted long RNA, short RNA) separately. For the long RNA read sets, we considered intronic real estate if and only if platelet reads covered a minimum of 100 consecutive nucleotides and the covered region had an estimated abundance relatively to ACTB of 1:1024 (which is equivalent to a conservative dynamic range of not more than 10 PCR cycles beyond ACTB). For the short RNA read sets, we only considered platelet reads mapping to intronic real estate if they were at least 30 nucleotides long and had an estimated abundance relatively to SNORD44 of 1:64 (which is equivalent to a conservative dynamic range of not more than 6 PCR cycles beyond SNORD44). Given the high stringencies of length and abundance, we accepted such a region if at least one of the sequenced samples showed evidence for it. Across the four samples and two long RNA preparations, we found a total of 6,992 bona fide intronic regions that give rise to currently uncharacterized long RNA transcripts satisfying the above constraints. We also found an additional 1,236 bona fide intronic regions that give rise to currently uncharacterized short RNA transcripts satisfying the above constraints. Notably, these two collections of intronic regions had only 18 members in common suggesting that the two novel populations of (long and short) uncharacterized bona fide intronic transcripts originate from distinct genomic loci. Additional file6: Table S8 lists the genomic coordinates for these two groups of intronic regions.
Novel and pervasive antisense transcripts
Our analysis also revealed the presence of a substantial number of long and short platelet transcripts that were antisense to known miRNAs, known protein-coding exons, and notably, to known repeat element families. For the miRNA analysis, we processed separately the four read sets from the short RNA preparation. For the protein-coding transcript analysis, we processed separately the eight read sets from the total and rRNA-depleted preparations. For the repeat element analysis, we processed all read sets separately for each of the four sequenced individuals. The following are the 10 miRNA precursors with previously unreported antisense transcripts: hsa-miR-33b, hsa-miR-101, hsa-miR-191, hsa-miR-219-2, hsa-miR-374b, hsa-miR-486, hsa-miR-625, hsa-miR-766, hsa-miR-3135b, and hsa-miR-4433. The short platelet RNAs we observed had lengths typical of a miRNA and were transcribed from the strand opposite of that of the known miRNA precursor. Each of the loci listed above generated one or two distinct antisense transcripts, presumably a mature miRNA and its “star” miRNA. There was also a high prevalence of transcripts that were antisense to known protein-coding regions of the genome. Table 3 shows the enrichment in such antisense transcripts that overlap the 5′UTRs, 3′UTRs or full-length exonic space of known protein-coding transcripts. Enrichment values are notable, independently of whether we computed them in terms of span (which ignores the number of reads sequenced from a genetic locus) or in terms of support (which takes into account the number of reads sequenced from a genetic locus). Unexpectedly, our analyses revealed notable enrichment in both long and short platelet RNAs that were antisense to several known repeat families. Table 4 shows these enrichments for the sequenced short platelet RNA-omes. Additional file1: Table S9 shows the corresponding values for the long platelet RNA-omes and separately for the total and rRNA-depleted preparations.
We use the characterization ‘orphan’ to refer to those RNA-seq reads that could not be mapped on the human genome using our default parameter settings. To ensure that we exhausted the possibilities, and in an effort to address the potential identities of unmapped transcripts, we conducted additional read mapping with alternative computational settings and using curated datasets.
First, and in light of recent reports of extensive editing of RNA transcripts we used the BWA algorithm with higher-than-default sensitivity settings: in particular, we permitted up to six mismatches in the context of BWA’s length-dependent scheme for allowing mismatches. We used this lenient parameter setting for both the total and rRNA-depleted preparations (8 sets of reads in total). In each case, we were able to map an additional approximately four million reads (~6.5% of the original set of sequenced reads). Additional file1: Table S10 provides relevant detailed statistics.
Since we used the full genome’s sequence to map the sequenced reads the formal possibility remains that perhaps a significant portion of the orphan reads originate from the exon-exon junctions of spliced protein-coding transcripts. Thus, our subsequent investigation used the 598,379 exons listed in Release 63 of ENSEMBL to combinatorially enumerate all possible exon-exon junctions using the known, non-overlapping exons of all 51,055 protein-coding and non-protein-coding genes contained in the Release. This gave rise to 12,382,819 junctions on which we attempted to map the orphan reads. Across all read sets that were sequenced from the total RNA preparations, an average of 185,026 reads were mapped onto the exon-exon junction set. The corresponding number for the sets obtained from the rRNA-depleted preparations was 191,736 reads. In both cases, only a very small fraction of the reads mapped to exon-exon junctions.
Lastly, we examined the possibility that the orphan reads originate from the highly polymorphic human leukocyte antigen (HLA) region of chromosome 6. To this end, we used the 6,944 sequences contained in Release 3.5 of the IMGT/HLA database[41, 42] and searched them with BWA and standard settings. An average of 5,601 (total RNA) and 5,564 (rRNA-depleted RNA) reads were mapped to this region suggesting that transcripts from the HLA regions do not contribute in any significant manner to the platelet transcriptome.
Results have been embedded in a local mirror of the UCSC genome browser and can be examined interactively athttp://cm.jefferson.edu/platelets_2012/.
The data set supporting the results of this article is available in the NCBI/GEO repository, accession number SRA062032,http://www.ncbi.nlm.nih.gov/sra. The data sets supporting the results of this article are included within the article and its additional files.
The cellular transcriptome
A prominent lesson that has emerged from the 1000 Genomes Project is the greater genetic variation in the population than previously appreciated. Transcriptomics is rapidly assuming a more prominent role in the understanding of basic molecular mechanisms accounting for variation within the normal population and inherited disease. We have sequenced RNA from the leukocyte-depleted platelets of four healthy individuals and report our findings from the analysis of the long and short RNA transcript populations. In the case of long RNAs, we carried out sequencing of both total and rRNA-depleted RNA. The generated data, accompanying genome browser, and data repository detail the totality of RNA species present in the anucleate human platelet. We are unaware of prior efforts that have provided as comprehensive a transcriptome evaluation of any human cell as offered in our report. Our approach serves as a roadmap for future transcriptome analyses and the findings have important implications for the understanding of the transcriptome and the role of platelets in health and disease.
We utilized a distinct approach to the elucidation of the platelet transcriptome that, as we discovered, exhibits an extraordinary complexity. Features of our approach include: 1) the use of the anucleate platelet that decouples the nuclear and cytoplasmic transcriptomes; 2) the use of total RNA and not poly-A enriched RNA; 3) the use of a next generation sequencing platform (AB/LT SOLiD) that generated the high enough read numbers needed to provide the required resolution power; 4) the explicit evaluation of the impact of the ribosomal RNA depletion step prior to sequencing; 5) an enhanced mapping protocol that ensured exhaustive mapping of the sequenced reads on the un-masked human genome and the exclusion of reads that could not be mapped uniquely; and, 6) the explicit search for the presence or absence of RNA species that either have not been previously discussed in the context of platelet biology or that are not currently annotated in the public databases. Findings from our analyses reveal a much more diverse platelet transcriptome than previously appreciated, and include pseudogenes, repeat elements, bona fide intronic transcripts, novel short and long RNAs, transcripts antisense to exons and antisense to miRNAs. Our data are publicly available and can be explored interactively through our local mirror of the UCSC genome browser athttp://cm.jefferson.edu/platelets_2012/.
The platelet context
Blood platelets originate from bone marrow precursor megakaryocytes. As such, most platelet RNA results from the transcription of nuclear DNA in the megakaryocyte, and thus reflects the status of the megakaryocyte at the time of platelet release into the circulation. Notably, megakaryocytes from human bone marrow are neither routinely nor easily accessible for biological studies. Megakaryocyte gene transcription responds to numerous normal physiologic and pathologic stimuli. Additionally, anucleate platelets are known to engage in both post-transcriptional processing of RNA and translation of mRNA into protein, in response to external factors[9, 43]. Consequently, the platelet transcriptome represents a critical proxy biomarker of both megakaryocyte activity and of the hemostatic, thrombotic, and inflammatory challenges to the organism. These properties in conjunction with the rapidly emerging appreciation of the role of non-coding RNAs in post-transcriptional processing and translation make an inventory of the platelet RNA-ome both timely and important.
Compared to other RNA-evaluating technologies, the current limitations of RNA-seq in general and as applied to platelets are the expense and the need for sophisticated computational analyses that have not yet been standardized or made widely available. As experience with the method progresses and prices drop, these limitations will be offset by the advantages of superior dynamic range, the discovery of novel transcripts, and the simultaneous assessment of expression levels, sequence variants and splice variants, none of which can be achieved using conventional probe-based transcript analysis. A direct digital detection technology (referred to as “Nanostring”) offers the advantage of requiring less starting material, which can be limiting in platelet RNA studies, but this technology is only available for profiling known miRNAs or limited sets of known mRNAs. Of course, any RNA transcriptome analysis must be considered in the context of potential differences with megakaryocytes. Recently, platelet RNA-seq successfully revealed abnormal splicing events in 1) NBEAL2, thus identifying the gene responsible for the Gray Platelet Syndrome, and, 2) the RNA-binding protein RBM8A, thus uncovering the gene responsible for the TAR (=thrombocytopenia and absent radii) syndrome. Our data will serve as an early and comprehensive reference and resource for other investigators wishing to understand better the normal platelet transcriptome when searching for disease-producing transcript variants. Furthermore, it will serve as a much needed “parts list” of platelet RNAs in the context of studies of RNA-RNA and RNA-protein regulatory interactions. The absence of active transcription makes the platelet an attractive cell type for elucidating and deciphering such higher order regulatory couplings.
RNA-seq is highly sensitive and capable of detecting variability between samples caused by biological differences, technical variation, or environmental influence during sample handling. The samples in our study were processed using a methodology with excellent reproducibility that minimizes technical and environmental factors, and that was able to discover novel genetic and transcriptomic variants regulating platelet biologic function[11, 48, 49]. However, additional platelet RNA-seq data and analyses from a larger number of subjects is needed to assess the relative contribution of biological versus technical factors contributing to the observed transcript variation.
It is difficult to compare and contrast our study with that of Rowley et al. because of key differences in design, and in the technical and analytic approach. A particular value of the Rowley study is the comparison of human and mouse platelet transcriptomes, which noted some unexpected differences. However, Rowley et al. did a single sequencing run on a pool of 2 human donors, whereas we separately sequenced and provide profiles of long total RNA, long rRNA-depleted RNA, and, short RNA from 4 subjects. The larger number of samples, an increased sequencing granularity, a normalization scheme that allows comparisons and assessment of inter-individual variation, and a wide-ranging analysis of the culled RNA-omes (both protein-coding and non-coding) represent key elements of our work. Additionally, our use of the industry-standard UCSC genome browser for visualizing our data will enable faster access and dissemination of our results.
Validity of the approach
Comparison of our data to microarray results, both ours and those in the public databases, showed RNA-seq to have significant correlation with microarray for the subset of abundant protein-coding RNAs. GO analyses indicated that the expressed mRNAs were enriched in terms such as coagulation, platelet degranulation, secretion, cytoskeletal dynamics, receptor binding and G-protein signaling. These analyses validate and support RNA-seq and our analytic approach as appropriate for assessing the platelet transcriptome.
The number of protein coding transcripts
In this work we confirm and, more importantly, extend earlier platelet transcriptome studies by us and others[10–12, 15] in unanticipated ways. Prior platelet work estimated the number of protein-coding transcripts to between 1,500 and 9,000. These earlier efforts neither emphasized nor appreciated the notion that such a count is somewhat of a “moving target.” Our analyses of the RNA-seq data clearly demonstrate that such an estimate and the ability to do cross-sample comparisons depend upon 1) the resolution ability of the used sequencing platform, 2) the read mapping criteria (e.g., use of uniquely mapping reads), and 3) the used “read count” threshold. Within 16 PCR cycles of β-actin, we find ~9,000 mRNAs in the platelets of 4 healthy donors. Relaxed or more stringent criteria provide higher or lower estimates, respectively (Figure 1).
Ribosomal RNA depletion
Depletion of ribosomal RNA is considered a standard approach in RNA-seq studies of nucleated cells. Driving the choice is the observation that rRNA makes up ~75-80% of the total amount of cellular RNA. To the best of our knowledge, the impact of rRNA depletion has not been previously studied, certainly not in the context of platelet transcriptome analyses. Importantly, we found that rRNA depletion strongly and adversely impacts the characterization of platelet protein coding transcripts. Indeed, rRNA depletion resulted in variations in abundance estimates that confounded meaningful analyses across samples. Neither we nor others in the field have ascertained the underlying mechanism by which rRNA depletion alters relative mRNA abundances. Our finding does not appear to be a non-specific artifact: not only is the dynamic range of the observed impact very significant but the number of the affected mRNAs is large. In previous work, several authors noted that platelets dock mRNAs to ribosomes and that this process can be selective for features of specific mRNAs[50, 51], so it is conceivable that the observed impact of rRNA depletion on mRNA abundance is a platelet-specific event.
Novel antisense transcripts
Our analyses unexpectedly revealed the existence of numerous transcripts that are antisense to previously annotated genomic regions. In particular, we discovered consistent enrichment in RNAs that are antisense to the exons of known protein-coding loci across the four healthy donors. We also found enrichment in long RNAs that are antisense to known repeat families. Notably, we found even more pronounced enrichment in short RNAs that are antisense to many different repeat families. Naturally occurring antisense transcripts are important regulators of gene expression via interference with translation, RNA masking, etc., and our results suggest the possibility of important, previously unappreciated roles of antisense transcripts in platelet biology.
Many more miRNAs
We also discovered a larger number of platelet miRNAs than previously reported, and separately characterized and reported on the two potential products that can be transcribed from a microRNA precursor. Importantly, for 10 miRNA loci, we found and report evidence of transcription of short, miRNA-like-in-length RNAs that are antisense to the known miRNA or its “star” species. Not only are these miRNA antisense species not currently contained in miRBase, but to the best of our knowledge, they have not been reported previously in the platelet context.
An additional intriguing finding pertains to our identification of both short and long RNA transcripts that originate from thousands of intronic DNA genomic regions and are not currently annotated in the public repositories as known non-protein-coding transcripts from protein-coding loci, miRNAs, ribosomal RNAs, tRNAs, repeat elements, etc. In particular, the intronic loci that give rise to the sequenced long platelet RNAs are distinct from the intronic loci that give rise to the sequenced short platelet RNAs. It is important to stress that we required a minimum span of 100 nucleotides for those intronic regions that gave rise to the un-annotated long RNAs and a minimum span of 30 nucleotides for those intronic regions that gave rise to the un-annotated short RNAs. Such strict criteria suggest that the actual sources of un-annotated intronic platelet transcripts are more numerous. Although it is possible that intergenic RNAs or transcripts with retained introns are not functional in platelets, there is accumulating evidence that intronic regions likely play rather involved and functionally significant roles in a cell[53–58]. These reports, in conjunction with the data that we have generated, and together with other accounts whereby specific intronic transcripts have been associated with some diseases[45, 46] suggest that our resource will be useful reference material for platelet disorders.
Transcripts from repeat elements and pseudogenes
One of the unexpected findings that emerged from our work and analyses is the pervasive presence of long and short RNAs that are both sense and antisense to the genomic locations of many families of repeat elements, and nearly all known pseudogenes. The presence of expressed repeat elements per se is not new and has been reported previously, e.g.[59–61]. Additionally, many reports have already provided evidence of significant connections between repeat elements and cellular processes in health and disease[62–66]. However, there are several novel and interesting elements that emerged from our analyses and warranted reporting. First, we observed that there were specific categories of repeat elements that were present in our profiles. Second, the present categories seemed to have consistent enrichments across the sequenced individuals. Lastly, the consistency in the profiles among individuals - despite the absence of transcriptional activity - fuels the hypothesis that these repeats are of potential functional significance in the platelet context. However, the considerable numbers of repeat elements and the relative diversity of their categories make it difficult to conjecture what their roles may be. Indeed there is a large number of possibilities that include: the possible formation of endogenous siRNAs or small non-coding RNAs as previously described[67, 68]; the possible production of currently unsuspected miRNAs[69, 70]; the creation of substrates for miRNA targeting that could then acts as “decoys,” i.e. as competing endogenous RNAs (ceRNAs) that regulate mRNAs[20–22, 71, 72]; their involvement in previously unrecognized regulatory mechanisms[73–75]; etc. It is also important to note that the observed repeat expression in platelets appears to be ‘marshalled’ and unlike the aberrant expression that has been reported in human cancers.
Our work has revealed a highly complex transcriptional landscape for the anucleate human platelet. The richness and diversity of the present RNA molecules suggests a context where platelet biology transcends protein- and miRNA-centric descriptions. By making available our findings we aim to facilitate the elucidation of previously unappreciated molecular species and molecular interactions. This will eventually permit an improved understanding of the molecular mechanisms that regulate platelet physiology and contribute to serious disorders of thrombosis, hemostasis, and inflammation.
RNA preparation and RNA-seq
The study was approved by the Institutional Review Board of Thomas Jefferson University, and informed consent was obtained from all participants. Highly purified, leukocyte-depleted platelet (LDP) preparations were obtained as previously described. RNA extraction was performed with TRIzol® (Invitrogen, Carlsbad, CA) and RNA quality was assessed by the Agilent bio-analyzer PICO chip. In some experiments, 2 μg of total RNA was depleted of large 18S and 28S rRNA as well as 5S and 5.8S using the RiboMinus Eukaryotic Kit (Invitrogen), which uses biotinylated probes designed against these rRNAs (following the manufacturer‘s protocol). Total RNA and rRNA-depleted RNA were fragmented using RNAse III digestion for 13 min in a 10 μl reaction containing 1 μl of 10X RNAse III buffer and 20 U of RNAse III. After incubation the RNA was purified using the RiboMinus Concentration Module (Invitrogen), and the size and yield of RNA was determined using the Agilent bio-analyzer PICO chip. Library construction, emulsion PCR, workflow analysis and sequencing runs were performed following standard AB/Life Technologies protocols. A typical sequence run generated ~100 million reads of 50 nt each for long RNA and 30 nt long for short RNA, with the “strandedness” of the read on genomic DNA preserved.
Sequenced reads were mapped onto the human genome assembly hg19 using the Burrows-Wheeler Alignment (BWA) algorithm. Reads sequenced from the short RNA preparation were pre-processed using the cutadapt utility. During mapping, all reads were quality-trimmed using each read’s associated quality values. Also, we allowed up to 2 mismatches in each read using BWA’s internal adaptive, read-length-dependent scheme. We did not allow any insertions or deletions. All reads that were mapped to the genome were post-processed, and those that landed on multiple locations (whether on the same or different chromosomes) were discarded and excluded from further consideration. The uniquely mapped reads can be examined athttp://cm.jefferson.edu/platelets_2012/ by navigating to the genomic locus of interest; reads mapped to the forward strand are shown in blue, and reads mapped to the reverse strand in red.
Estimating expression levels for transcripts
For each protein-coding transcript T, we determined the number of reads that uniquely mapped to T’s exons; distinct sequenced reads that mapped to the same genomic/exonic location of T were counted multiply. We defined the normalized expression (neT) of T by the ratio (cT/L T )/(cβ-actin/Lβ-actin), where cT was the read count for T, cβ-actin was the read count for the β-actin transcript, and L T and L β-actin were the respective lengths of each transcript. For genes with multiple known protein-coding transcripts, the gene was assigned the neT value of its most abundant transcript. As described in the Results, we established that β-actin mRNA is abundantly and consistently expressed across samples. Throughout this study, we used the β-actin transcript with ENSEMBL identifier ENST00000331789 and the relationship of neT to that of neENST00000331789 to determine the presence or absence of protein-coding transcript T and, by extension, of the expression of the parent gene.
For non-coding transcripts, we used the same approach but instead of β-actin we used the levels of the small nucleolar RNA SNORD44 as reference. This choice was informed by the abundance and apparent stability of SNORD44’s expression across many tissues and cell lines.
Quantitative Reverse Transcription PCR (qRT-PCR) of Gene Expression
One microgram total RNA was reverse transcribed and 1% of the resulting cDNA (equivalent to 10 ng starting RNA) was used in the PCR. Quantitative reverse transcriptase PCR (qRT-PCR) results using primers specific for known platelet genes and for a panel of 89 genes encoding G-protein-coupled receptors are described in the Supplement. mRNA levels were assessed by the 2-ΔΔCT method normalized to β-actin.
Correlation between platelet RNA-seq and microarray datasets
The average log2-normalized expression of each long total RNA transcript across the 4 samples was ranked by transcript abundance and compared to published platelet transcript profiles obtained on Affymetrix GeneChip and Illumina BeadChip microarray platforms[15, 17]. A Spearman’s correlation coefficient was computed for the genes that are represented on all platforms.
To characterize the human platelet transcriptome with regard to possible over-representation of transcripts of a specific type, enrichment analysis was performed using the coordinates of those RNA-seq reads from both long and short total platelet RNA transcriptomes that could be mapped on the genome and the genomic coordinates of categories of transcripts as these are reported in the ENSEMBL database.
Jones CI, Garner SF, Angenent W, Bernard A, Berzuini C, Burns P, Farndale RW, Hogwood J, Rankin A, Stephens JC, et al: Mapping the platelet profile for functional genomic studies and demonstration of the effect size of the GP6 locus. Journal of thrombosis and haemostasis: JTH. 2007, 5 (8): 1756-1765. 10.1111/j.1538-7836.2007.02632.x.
O'Donnell CJ, Larson MG, Feng D, Sutherland PA, Lindpaintner K, Myers RH, D'Agostino RA, Levy D, Tofler GH: Genetic and environmental contributions to platelet aggregation: the Framingham Heart Study. Circulation. 2001, 103 (25): 3051-3056. 10.1161/01.CIR.103.25.3051.
Bray PF, Mathias RA, Faraday N, Yanek LR, Fallin MD, Herrera-Galeano JE, Wilson AF, Becker LC, Becker DM: Heritability of platelet function in families with premature coronary artery disease. Journal of thrombosis and haemostasis: JTH. 2007, 5 (8): 1617-1623. 10.1111/j.1538-7836.2007.02618.x.
Johnson AD, Yanek LR, Chen MH, Faraday N, Larson MG, Tofler G, Lin SJ, Kraja AT, Province MA, Yang Q, et al: Genome-wide meta-analyses identifies seven loci associated with platelet aggregation in response to agonists. Nat Genet. 2010, 42 (7): 608-613. 10.1038/ng.604.
Gieger C, Radhakrishnan A, Cvejic A, Tang W, Porcu E, Pistis G, Serbanovic-Canic J, Elling U, Goodall AH, Labrune Y, et al: New gene functions in megakaryopoiesis and platelet formation. Nature. 2011, 480 (7376): 201-208. 10.1038/nature10659.
Weyrich AS, Schwertz H, Kraiss LW, Zimmerman GA: Protein synthesis by platelets: historical and new perspectives. Journal of thrombosis and haemostasis: JTH. 2009, 7 (2): 241-246. 10.1111/j.1538-7836.2008.03211.x.
Gnatenko DV, Perrotta PL, Bahou WF: Proteomic approaches to dissect platelet function: Half the story. Blood. 2006, 108 (13): 3983-3991. 10.1182/blood-2006-06-026518.
McRedmond JP, Park SD, Reilly DF, Coppinger JA, Maguire PB, Shields DC, Fitzgerald DJ: Integration of proteomics and genomics in platelets: a profile of platelet proteins and platelet-specific genes. Molecular & cellular proteomics: MCP. 2004, 3 (2): 133-144.
Denis MM, Tolley ND, Bunting M, Schwertz H, Jiang H, Lindemann S, Yost CC, Rubner FJ, Albertine KH, Swoboda KJ, et al: Escaping the nuclear confines: signal-dependent pre-mRNA splicing in anucleate platelets. Cell. 2005, 122 (3): 379-391. 10.1016/j.cell.2005.06.015.
Rowley JW, Oler AJ, Tolley ND, Hunter BN, Low EN, Nix DA, Yost CC, Zimmerman GA, Weyrich AS: Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes. Blood. 2011, 118 (14): e101-111. 10.1182/blood-2011-03-339705.
Nagalla S, Shaw C, Kong X, Kondkar AA, Edelstein LC, Ma L, Chen J, McKnight GS, Lopez JA, Yang L, et al: Platelet microRNA-mRNA coexpression profiles correlate with platelet reactivity. Blood. 2011, 117 (19): 5189-5197. 10.1182/blood-2010-09-299719.
Gnatenko DV, Dunn JJ, McCorkle SR, Weissmann D, Perrotta PL, Bahou WF: Transcript profiling of human platelets using microarray and serial analysis of gene expression. Blood. 2003, 101 (6): 2285-2293. 10.1182/blood-2002-09-2797.
Kim HL: Comparison of oligonucleotide-microarray and serial analysis of gene expression (SAGE) in transcript profiling analysis of megakaryocytes derived from CD34+ cells. Exp Mol Med. 2003, 35 (5): 460-466. 10.1038/emm.2003.60.
Gnatenko DV, Cupit LD, Huang EC, Dhundale A, Perrotta PL, Bahou WF: Platelets express steroidogenic 17beta-hydroxysteroid dehydrogenases. Distinct profiles predict the essential thrombocythemic phenotype. Thromb Haemost. 2005, 94 (2): 412-421.
Kondkar AA, Bray MS, Leal SM, Nagalla S, Liu DJ, Jin Y, Dong JF, Ren Q, Whiteheart SW, Shaw C, et al: VAMP8/endobrevin is overexpressed in hyperreactive human platelets: suggested role for platelet microRNA. Journal of thrombosis and haemostasis: JTH. 2010, 8 (2): 369-378. 10.1111/j.1538-7836.2009.03700.x.
Burns P, Gusnanto A, Macaulay IC, Rankin A, Tom B, Langford CF, Dudbridge F, Ouwehand WH, Watkins NA: Identification of variation in the platelet transcriptome associated with glycoprotein 6 haplotype. Platelets. 2008, 19 (4): 258-267. 10.1080/09537100801947434.
Goodall AH, Burns P, Salles I, Macaulay IC, Jones CI, Ardissino D, de Bono B, Bray SL, Deckmyn H, Dudbridge F, et al: Transcription profiling in human platelets reveals LRRFIP1 as a novel protein regulating platelet function. Blood. 2010, 116 (22): 4646-4656. 10.1182/blood-2010-04-280925.
Healy AM, Pickard MD, Pradhan AD, Wang Y, Chen Z, Croce K, Sakuma M, Shi C, Zago AC, Garasic J, et al: Platelet expression profiling and clinical validation of myeloid-related protein-14 as a novel determinant of cardiovascular events. Circulation. 2006, 113 (19): 2278-2284. 10.1161/CIRCULATIONAHA.105.607333.
Shabalina SA, Spiridonov NA: The mammalian transcriptome and the function of non-coding DNA sequences. Genome Biol. 2004, 5 (4): 105-10.1186/gb-2004-5-4-105.
Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP: A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010, 465 (7301): 1033-1038. 10.1038/nature09144.
Rigoutsos I, Furnari F: Gene-expression forum: Decoy for microRNAs. Nature. 2010, 465 (7301): 1016-1017. 10.1038/4651016a.
Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, et al: Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell. 2011, 147 (2): 344-357. 10.1016/j.cell.2011.09.029.
Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458 (7235): 223-227. 10.1038/nature07672.
Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, et al: lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011, 477 (7364): 295-300. 10.1038/nature10398.
Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP: Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011, 147 (7): 1537-1550. 10.1016/j.cell.2011.11.055.
Esteller M: Non-coding RNAs in human disease. Nat Rev Genet. 2011, 12 (12): 861-874. 10.1038/nrg3074.
Schwanekamp JA, Sartor MA, Karyala S, Halbleib D, Medvedovic M, Tomlinson CR: Genome-wide analyses show that nuclear and cytoplasmic RNA levels are differentially affected by dioxin. Biochim Biophys Acta. 2006, 1759 (8–9): 388-402.
Clawson GA, Button JD, Liao YC, Smuckler EA: Cytoplasmic RNA complementary to pre-albumin intron V. Biochem Biophys Res Commun. 1986, 136 (3): 989-994. 10.1016/0006-291X(86)90430-4.
Gatfield D, Le Martelot G, Vejnar CE, Gerlach D, Schaad O, Fleury-Olela F, Ruskeepaa AL, Oresic M, Esau CC, Zdobnov EM, et al: Integration of microRNA miR-122 in hepatic circadian gene expression. Genes Dev. 2009, 23 (11): 1313-1326. 10.1101/gad.1781009.
Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.
Srivastava S, Chen L: A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 2010, 38 (17): e170-10.1093/nar/gkq670.
Lood C, Amisten S, Gullstrand B, Jonsen A, Allhorn M, Truedsson L, Sturfelt G, Erlinge D, Bengtsson AA: Platelet transcriptional profile and protein expression in patients with systemic lupus erythematosus: up-regulation of the type I interferon system is strongly associated with vascular disease. Blood. 2010, 116 (11): 1951-1957. 10.1182/blood-2010-03-274605.
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009, 10: 48-10.1186/1471-2105-10-48.
Landry P, Plante I, Ouellet DL, Perron MP, Rousseau G, Provost P: Existence of a microRNA pathway in anucleate platelets. NatStructMolBiol. 2009, 16 (9): 961-966.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110 (1–4): 462-467.
Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, Thomas K, Presser A, Bernstein BE, van Oudenaarden A, et al: Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A. 2009, 106 (28): 11667-11672. 10.1073/pnas.0904715106.
Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, et al: Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010, 464 (7291): 1071-1076. 10.1038/nature08975.
Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, et al: Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet. 2010, 42 (12): 1113-1117. 10.1038/ng.710.
Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, Goodnough LH, Helms JA, Farnham PJ, Segal E, et al: Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007, 129 (7): 1311-1323. 10.1016/j.cell.2007.05.022.
Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM: Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science. 2009, 324 (5931): 1210-1213. 10.1126/science.1170995.
Robinson J, Malik A, Parham P, Bodmer JG, Marsh SG: IMGT/HLA database–a sequence database for the human major histocompatibility complex. Tissue Antigens. 2000, 55 (3): 280-287. 10.1034/j.1399-0039.2000.550314.x.
Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, Marsh SG: The IMGT/HLA database. Nucleic Acids Res. 2011, 39 (Database issue): D1171-1176.
Rondina MT, Schwertz H, Harris ES, Kraemer BF, Campbell RA, Mackman N, Grissom CK, Weyrich AS, Zimmerman GA: The septic milieu triggers expression of spliced tissue factor mRNA in human platelets. J Thromb Haemost. 2011, 9 (4): 748-758. 10.1111/j.1538-7836.2011.04208.x.
Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, et al: Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008, 26 (3): 317-325. 10.1038/nbt1385.
Kahr WH, Hinckley J, Li L, Schwertz H, Christensen H, Rowley JW, Pluthero FG, Urban D, Fabbro S, Nixon B, et al: Mutations in NBEAL2, encoding a BEACH protein, cause gray platelet syndrome. Nat Genet. 2011, 43 (8): 738-740. 10.1038/ng.884.
Albers CA, Paul DS, Schulze H, Freson K, Stephens JC, Smethurst PA, Jolley JD, Cvejic A, Kostadima M, Bertone P, et al: Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat Genet. 2012, 44 (4): 435-439. 10.1038/ng.1083.
Yee DL, Bergeron AL, Sun CW, Dong JF, Bray PF: Platelet hyperreactivity generalizes to multiple forms of stimulation. J Thromb Haemost. 2006, 4 (9): 2043-2050. 10.1111/j.1538-7836.2006.02089.x.
Kondkar AA, Bray MS, Leal SM, Nagalla S, Liu DJ, Jin Y, Dong JF, Ren Q, Whiteheart SW, Shaw C, et al: VAMP8/endobrevin is overexpressed in hyperreactive human platelets: suggested role for platelet microRNA. J Thromb Haemost. 2010, 8 (2): 369-378. 10.1111/j.1538-7836.2009.03700.x.
Edelstein LC, Luna EJ, Gibson IB, Bray M, Jin Y, Kondkar A, Nagalla S, Hadjout-Rabi N, Smith TC, Covarrubias D, et al: Human genome-wide association and mouse knockout approaches identify platelet supervillin as an inhibitor of thrombus formation under shear stress. Circulation. 2012, 125 (22): 2762-2771. 10.1161/CIRCULATIONAHA.112.091462.
Plante I, Ple H, Landry P, Gunaratne PH, Provost P: Modulation of microRNA Activity by Semi-microRNAs. Frontiers in genetics. 2012, 3: 99-
Zimmerman GA, Weyrich AS: Signal-dependent protein synthesis by activated platelets: new pathways to altered phenotype and function. Arterioscler Thromb Vasc Biol. 2008, 28 (3): s17-24. 10.1161/ATVBAHA.107.160218.
Li K, Ramchandran R: Natural antisense transcript: a concomitant engagement with protein-coding transcript. Oncotarget. 2010, 1 (6): 447-452.
Bell TJ, Miyashiro KY, Sul J-Y, McCullough R, Buckley PT, Jochems J, Meaney DF, Haydon P, Cantor C, Parsons TD, et al: Cytoplasmic BK(Ca) channel intron-containing mRNAs contribute to the intrinsic excitability of hippocampal neurons. Proc Natl Acad Sci U S A. 2008, 105 (6): 1901-1906. 10.1073/pnas.0711796105.
Buckley PT, Lee MT, Sul J-Y, Miyashiro KY, Bell TJ, Fisher SA, Kim J, Eberwine J: Cytoplasmic intron sequence-retaining transcripts can be dendritically targeted via ID element retrotransposons. Neuron. 2011, 69 (5): 877-884. 10.1016/j.neuron.2011.02.028.
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al: Landscape of transcription in human cells. Nature. 2012, 489 (7414): 101-108. 10.1038/nature11233.
Li Y, Bor Y-C, Misawa Y, Xue Y, Rekosh D, Hammarskj√∂ld ML: An intron with a constitutive transport element is retained in a Tap messenger RNA. Nature. 2006, 443 (7108): 234-237. 10.1038/nature05107.
Louro R, El-Jundi T, Nakaya HI, Reis EM, Verjovski-Almeida S: Conserved tissue expression signatures of intronic noncoding RNAs transcribed from human and mouse loci. Genomics. 2008, 92 (1): 18-25. 10.1016/j.ygeno.2008.03.013.
Tsirigos A, Rigoutsos I: Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs. Nucleic Acids Res. 2008, 36 (10): 3484-10.1093/nar/gkn155.
Belancio VP, Roy-Engel AM, Pochampally RR: Somatic expression of LINE-1 elements in human tissues. Nucleic Acids Res. 2010, 38 (12): 3909-3922. 10.1093/nar/gkq132.
Faulkner GJ, Kimura Y, Daub CO, Plessy C, Irvine KM, Schroder K, Cloonan N, Lassmann T, Waki K, Hornig N, et al: The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. 2009, 41 (5): 563-571. 10.1038/ng.368.
Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, Kimura-Yoshida C, Matsuo I, Sumiyama K, Saitou N, et al: Possible involvement of SINEs in mammalian-specific brain formation. Proc Natl Acad Sci U S A. 2008, 105 (11): 4220-4225. 10.1073/pnas.0709398105.
Cruickshanks HA, Tufarelli C: Isolation of cancer-specific chimeric transcripts induced by hypomethylation of the LINE-1 antisense promoter. Genomics. 2009, 94 (6): 397-406. 10.1016/j.ygeno.2009.08.013.
Montoya-Durango DE, Ramos KS: L1 retrotransposon and retinoblastoma: molecular linkages between epigenetics and cancer. Curr Mol Med. 2010, 10 (5): 511-521. 10.2174/156652410791608234.
Ohnishi Y, Totoki Y, Toyoda A, Watanabe T, Yamamoto Y, Tokunaga K, Sakaki Y, Sasaki H, Hohjoh H: Active role of small non-coding RNAs derived from SINE/B1 retrotransposon during early mouse development. Mol Biol Rep. 2011, 36: 903-9.
Ramos KS, He Q, Kalbfleisch T, Montoya-Durango DE, Teneng I, Stribinskis V, Brun M: Computational and biological inference of gene regulatory networks of the LINE-1 retrotransposon. Genomics. 2007, 90 (2): 176-185. 10.1016/j.ygeno.2007.04.003.
Ting DT, Lipson D, Paul S, Brannigan BW, Akhavanfard S, Coffman EJ, Contino G, Deshpande V, Iafrate AJ, Letovsky S, et al: Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers. Science (New York, NY). 2011, 331: 593-6. 10.1126/science.1200801.
Smalheiser NR, Lugli G, Thimmapuram J, Cook EH, Larson J: Endogenous siRNAs and noncoding RNA-derived small RNAs are expressed in adult mouse hippocampus and are up-regulated in olfactory discrimination training. RNA. 2011, 17 (1): 166-181. 10.1261/rna.2123811.
Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, et al: Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature. 2008, 453 (7194): 539-543. 10.1038/nature06908.
Yuan Z, Sun X, Liu H, Xie J: MicroRNA genes derived from repetitive elements and expanded by segmental duplication events in mammalian genomes. PLoS One. 2011, 6 (3): e17666-10.1371/journal.pone.0017666.
Gu TJ, Yi X, Zhao XW, Zhao Y, Yin JQ: Alu-directed transcriptional regulation of some novel miRNAs. BMC Genomics. 2009, 10: 563-10.1186/1471-2164-10-563.
Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier O, Chinappi M, Tramontano A, Bozzoni I: A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell. 2011, 147 (2): 358-369. 10.1016/j.cell.2011.09.028.
Smalheiser NR, Torvik VI: Alu elements within human mRNAs are probable microRNA targets. Trends in genetics: TIG. 2006, 22 (10): 532-536. 10.1016/j.tig.2006.08.007.
Modarresi F, Faghihi MA, Lopez-Toledano MA, Fatemi RP, Magistri M, Brothers SP, van der Brug MP, Wahlestedt C: Inhibition of natural antisense transcripts in vivo results in gene-specific transcriptional upregulation. Nat Biotechnol. 2012, 30 (5): 453-459. 10.1038/nbt.2158.
Muro EM, Andrade-Navarro MA: Pseudogenes as an alternative source of natural antisense transcripts. BMC Evol Biol. 2010, 10: 338-10.1186/1471-2148-10-338.
Lin R, Roychowdhury-Saha M, Black C, Watt AT, Marcusson EG, Freier SM, Edgington TS: Control of RNA processing by a large non-coding RNA over-expressed in carcinomas. FEBS Lett. 2011, 585 (4): 671-676. 10.1016/j.febslet.2011.01.030.
Martin M: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011, 17 (1): 10-12. Next Generation Sequencing Data Analysis
Wong L, Lee K, Russell I, Chen C: Endogenous Controls for Real-Time Quantitation of miRNA Using TaqMan® MicroRNA Assays. Edited by: Biosystems A. 2007, Foster City: Applied Biosystems
Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2ΔΔCt method. Methods. 2001, 25 (4): 402-408. 10.1006/meth.2001.1262.
We wish to thank Ms. Lin Ma and Dr. Xianguo Kong for excellent technical assistance. This work was supported in part by the Cardeza Foundation for Hematological Research (Philadelphia, PA; PFB. LCE, SEM), the Kimmel Cancer Center (Philadelphia, PA; PF, KD, JK and AE), and by grants from the NIH-NCI Cancer Center (P30-CA-56036 to PF, KD and AE), NIH-NHLBI (R01-HL-102482 to PB), NIH-NIAID (2U19AI056363-06/2030984 to IR), the William M. Keck Foundation (IR) and TJU funds (IR, YJ, EL, PL, H-WC). The authors declare no conflicts of interest.
The authors’ declare that they have no competing interests.
PFB, SEM, PF, and IR conceived of the study, and participated in its design and coordination. IR, PFB, PF, SEM, and LCE wrote the manuscript. LCE participated in the design and coordination, and acquired and analyzed the data. IR, H-WC, EL, PL, AE, and YJ acquired and analyzed the data. SN, KD, and JK participated in the design and acquired data. All authors read and approved the final manuscript.
Paul F Bray, Steven E McKenzie, Paolo Fortina and Isidore Rigoutsos contributed equally to this work.