piRNA-like small RNAs mark extended 3’UTRs present in germ and somatic cells
© Yamtich et al. 2015
Received: 23 December 2014
Accepted: 29 May 2015
Published: 16 June 2015
Piwi-interacting RNAs (piRNAs) are a class of small RNAs; distinct types of piRNAs are expressed in the mammalian testis at different stages of development. The function of piRNAs expressed in the adult testis is not well established. We conducted a detailed characterization of piRNAs aligning at or near the 3’ UTRs of protein-coding genes in a deep dataset of small RNAs from adult mouse testis.
We identified 2710 piRNA clusters associated with 3’ UTRs, including 1600 that overlapped genes not previously associated with piRNAs. 35 % of the clusters extend beyond the annotated transcript; we find that these clusters correspond to, and are likely derived from, novel polyadenylated mRNA isoforms that contain previously unannotated extended 3’UTRs. Extended 3’ UTRs, and small RNAs derived from them, are also present in somatic tissues; a subset of these somatic 3’UTR small RNA clusters are absent in mice lacking MIWI2, indicating a role for MIWI2 in the metabolism of somatic small RNAs.
The finding that piRNAs are processed from extended 3’ UTRs suggests a role for piRNAs in the remodeling of 3’ UTRs. The presence of both clusters and extended 3’UTRs in somatic cells, with evidence for involvement of MIWI2, indicates that this pathway is more broadly distributed than currently appreciated.
Among the numerous types of small RNAs that have been investigated in recent years, piRNAs have been of interest primarily because they seem to be essential for maintenance of germ cells. They range in size from 24–32 nt, complex with members of the PIWI clade of Argonaute proteins, and are enriched for a 5’ uridine [1–6]. piRNAs are present in a broad spectrum of metazoan species, and in the mouse they associate with three Piwi family proteins: MILI, MIWI, and MIWI2 [1–3, 7, 8]. MIWI2 is expressed at high levels in the prepachytene phase of spermatogenesis, where it complexes with piRNAs derived mostly from retrotransposon sequences and is involved in epigenetic suppression of retrotransposon activity [7, 9]. MIWI is expressed in the pachytene stage of meiosis and is required for spermatogenesis. MIWI-associated piRNAs are of a different type and have been implicated in regulation of mRNA and retrotransposon transcripts in differentiating spermatocytes [10–12], but may have other functions as well. When pachytene piRNA sequences are aligned to the genome, they tend to cluster into large regions, reflecting their derivation from longer single-stranded primary RNA pol II transcripts [1, 2, 4, 5, 13]. These piRNA clusters are found at syntenic regions in mouse, rat, and human , although orthologous regions in mouse and human produce piRNAs with different sequences .
Transposon-derived piRNAs are produced via a ping-pong amplification loop in which primary piRNAs antisense to transposon mRNA sequences direct the cleavage of the transposon mRNA, leading to the production of secondary sense piRNAs which then direct the cleavage of the complementary sequence from the primary piRNA transcript. This results in sense/antisense piRNAs characterized by a 10-bp 5’ overlap, and secondary piRNAs with an A at position ten . In prepachytene mouse testis, these sense/antisense piRNAs tend to be distributed along the length of a piRNA cluster .
Less is known about the functions of pachytene piRNAs. They are derived from single-stranded precursors, are not enriched for transposable element sequences, and do not show evidence of ping-pong amplification, suggesting that they are not involved in transposon silencing [13, 14]. They cofractionate with polysomes, and so may be involved in translational control of gene expression . A subset of pachytene piRNA clusters are associated with spliced mRNAs, particularly their 3’ untranslated regions (3’ UTRs) [15–17]. Although it is unclear how certain 3’ UTRs are selected for piRNA production, selection is not based simply on mRNA levels or the expression of specific Piwi proteins . Genes from which piRNAs are derived have more isoforms and antisense transcripts, suggesting a connection between piRNA production, antisense transcription, and alternative splicing . However, it is still not clear if the 3’ UTR is processed into piRNAs, or if they are instead derived from an independent transcript. To address this question, we conducted a detailed analysis of 3’ UTR piRNAs in adult mouse testis, and from somatic tissues. A deep dataset of small RNAs from adult mouse testis identifies many new 3’ UTR piRNA clusters, some of which are also present in somatic cells and are derived from rare and previously unannotated extended 3’ UTRs. Analysis of somatic piRNAs from MIWI2 null mice demonstrates that a subset of 3’UTR clusters is dependent on MIWI2, an unexpected finding that supports a somatic role for MIWI2. Additionally, we found small regions of sequence homology between intergenic and 3’ UTR piRNA clusters, suggesting a mechanism by which intergenic piRNA production could select 3’ UTRs of specific genes for processing.
Deep sequencing of small RNAs in the adult mouse testis
Comparison of small RNA sequences from adult mouse testis
Type A spermatogonia (GSM610965)
52,079 (4 %)
Pachytene spermatocytes (GSM610966)
378,724 (40 %)
Round spermatids (GSM610967)
419,213 (48 %)
MILI IP (GSM475280)
640,898 (20 %)
MIWI IP (GSM475279)
623,800 (22 %)
Total RNA (GSM475281)
600,957 (22 %)
We next aligned our unique small RNA sequences to the mouse genome allowing up to one mismatch (Additional file 1: Table S1). Approximately 7.5 million sequences aligned uniquely to the genome (88 % of aligned reads, Additional file 1: Table S1). These small RNA sequences had characteristics typical of piRNAs, such as a length distribution with a single peak centered at 29 nt (Fig. 1c) and a 5’ uridine bias (68 %). They aligned primarily in non-repetitive intergenic regions (63 %, Fig. 1d) corresponding to previously described piRNA clusters , with the remaining sequences aligning to repetitive and genic (mRNA and intron) regions (20 % and 17 %, respectively, Fig. 1d). Of the genic sequences, most aligned to regions corresponding to processed mRNA (Fig. 1d). Due to the overall similarity with published piRNA datasets, the piRNA-like sequence length, and the 5’ uridine bias, it is likely we have identified many new murine pachytene piRNA sequences.
A subset of piRNAs align in clusters at 3’ UTRs
We compared the genes having 3’ UTR piRNA clusters identified in our study with genes identified by reanalysis of previously published piRNA datasets [16, 17]. Our 2710 clusters significantly overlap with clusters obtained from previously reported piRNA datasets, containing 92 % of the 626 3’ UTR piRNA-producing genes identified in the dataset of Robine et al. , and 47 % of 1957 identified in the data of Gan et al. . 401 genes were shared by the three datasets, while our dataset contributed an additional 1600 genes. Previous studies identified genes with 3’ UTR piRNA clusters as being involved in nucleic acid metabolism, zinc ion binding, transcription, and regulation-related processes [16, 17]. Our findings are consistent with this, as the top functional categories identified by Ingenuity Pathway Analysis (IPA) are gene expression, post-translational modification, and RNA post-transcriptional modification (Additional file 1: Table S2). Our deeper dataset allowed IPA to identify enrichment for additional categories, including genes involved in infectious disease, the cell cycle, and DNA replication, recombination, and repair (Additional file 1: Table S2).
Under relaxed alignment conditions, 481 clusters (~18 %) contain antisense reads with the 10-bp overlap typical of ping-pong amplification (example in Additional file 1: Figure S2a). This overlap occurred at 1067 sites and involved 76,022 RNA sequences, 13,885 of them antisense to the 3’ UTR cluster. The antisense reads had a decreased 5’ uridine bias (61 %), a bias for an adenine at position 10 (43 %), and were localized in discrete sites generally containing one 10-bp sense/antisense overlap and covering less than 10 % of the cluster region (Additional file 1: Figure S2c-d). The best alignment for 2125 (29 %) of these reads was antisense to the same 3’ UTR sites (Additional file 1: Figure S2b). The remaining 5170 reads (71 %) aligned optimally to piRNA clusters distinct from the 3’ UTR clusters where they were discovered, with 5054 sequences aligning to intergenic piRNA clusters (Additional file 1: Figure S2b). This suggests the possibility of an interaction between 3’UTRs and intergenic piRNA clusters that have regions of sequence similarity, although the rarity of these RNAs does not support any extensive role for such a mechanism.
piRNA clusters mark extended 3’UTRs
The putative extended 3’ UTR regions are unannotated in the gene annotation (refSeq) used for the analysis. However by using an RNA-Seq dataset from adult mouse testis  we were able to detect significant expression of 781 (83 %) of these regions (Fig. 3b, expression >0.2 RPKM, which is greater than expression from 95 % of randomly selected regions). Most (95 %) of the unannotated portions of the piRNA cluster precursors were expressed at a lower level than the corresponding annotated mRNA (Fig. 3c). This finding is consistent with the hypothesis that these piRNA precursors are rare 3’ UTR isoforms rather than independently transcribed piRNA precursors.
Since RNA-Seq cannot demonstrate that reads from the extended 3’UTR clusters are derived from the same transcript as the annotated mRNA, we analyzed one of the highly expressed unannotated 3’ UTR piRNA precursors (Pdpr) by Northern blotting of testis RNA (Fig. 3d); we also tested three somatic tissues because it has been suggested that piRNAs may be expressed in somatic tissues . Northern blotting detected two brain transcripts, with lengths corresponding to the annotated mRNA and to the annotated mRNA plus the extended 3’ UTR (Fig. 3e, exon probe panel). We also carried out RT-PCR on 10 transcripts associated with piRNA clusters extending beyond the annotated 3’UTR. In 9 of the 10, RT-PCR detected transcripts extending from the annotated portion of the mRNA to the unannotated extended 3’UTR (an example is shown in Fig. 3f; the summary for all PCRs is in Fig. 3g). Taken together, these data suggest that many piRNAs aligning to regions beyond annotated 3’ UTRs are processed from polyadenylated 3’ UTRs that are not present in current genome annotations, i.e., the presence of an extended 3’UTR cluster predicts the existence of a longer alternative 3’UTR. For convenience, henceforth we will refer to these UTRs as testis xUTRs (extended UTRs).
xUTRs in somatic tissues
Because we noted evidence of 3’ extended transcripts in brain (Fig. 3e-g), we sought evidence of xUTRs in somatic RNA-Seq datasets from the ENCODE project. This analysis strongly supports the expression of xUTRs in a variety of tissues (Additional file 1: Figure S3). Most (79-93 %, depending on the tissue) testis xUTRs are also present in somatic tissues; the unannotated regions have median expression levels 0.18-0.33 fold that of the annotated expressed mRNA (Additional file 1: Figure S3). The highest expression of xUTRs was observed in cerebellum (Additional file 1: Figure S3). These RNA-Seq findings are consistent with results of RT-PCR, which detects transcripts extending from annotated portions of the mRNA to the unannotated piRNA precursor in brain, spleen, and liver (Fig. 3f-g). This suggests that precursor transcripts for the xUTRs are polyadenylated, less abundant than the canonical transcript, and in most cases not germline-specific.
Our identification of somatic 3’UTR-associated small RNA clusters led us to ask whether there are tissue-specific somatic 3’UTR piRNA clusters that are not found in the testis (see Methods for details). We identified 488 genes enriched for 3’ UTR small RNAs in the liver, and 345 in the spleen. Similar to the somatic small RNAs aligning to xUTR defined in the testis (above), small RNAs in these somatic clusters were not piRNA-like, but had a peak length distribution of 23 nt (Additional file 1: Figure S4a) and enrichment of A and U at the 5’ base (Additional file 1: Figure S4b). Many highly expressed genes in liver and spleen lacked these RNAs, suggesting that small RNAs identified in these clusters are not likely to be merely degradation products of highly expressed genes (Additional file 1: Figures S4c, d). GO term analysis indicates that most genes from each of the three tissues are associated with terms that are found only for that tissue, i.e., each tissue has a distinct set of functions that is associated with extended 3’UTRs and small RNAs (Additional file 1: Figure S4e and Additional Table S4).
A subset of somatic 3’UTR small RNA clusters are MIWI2-dependent
To compare the properties of the MIWI2-independent and MIWI2-dependent somatic xUTR clusters, we first removed clusters that had small RNA density in the lowest quartile in the WT dataset and no reads in the Miwi2 −/− dataset (the reads to the left of the grey bar at the bottom of Figs. 5a,b). This filter, which removes the most weakly supported clusters, retained 188 (39 %) and 157 (46 %) MIWI2-dependent 3’ UTR clusters in liver and spleen, respectively (Figs. 5a, b). MIWI2-dependent and independent clusters did not show differences in length distribution, 5’ base composition (Additional file 1: Figures S5a, b), or expression between genes (Additional file 1: Figure S5c, based on RNA-Seq data). Finally, we examined the expression of the coding region and 3’ UTR in a subset of genes with MIWI2 dependent and independent clusters of small RNAs in WT and MIWI2 KO mice. There were no significant and consistent differences in expression in WT and MIWI2 KO liver and spleen between these two groups (Additional file 1: Figure S6). Although we can find no evidence that that knockout affects the expression of genes with MIWI2-dependent somatic xUTRs, the finding that some somatic xUTR clusters are dependent on MIWI2 provides in vivo evidence for a somatic activity of MIWI2.
We analyzed RNAs uniquely aligning to 3’ UTRs in a deep dataset of small RNAs from the adult mouse testis, identifying approximately 1600 new genes with 3’ UTRs that may be piRNA precursors, as well as nearly 1000 clusters with piRNAs that align to previously unannotated extended 3’ UTRs. piRNA-like small RNAs aligning to many 3’UTRs are also present in somatic tissues, where they appear to mark extended 3’UTRs; a subset of these somatic clusters is missing from somatic tissues of Miwi2 −/− mice. This novel evidence that MIWI2 has a role in the somatic production or maintenance of small RNAs presents interesting problems for understanding of its functions. While these studies indicate a possible function for piRNA-like RNAs in modulating 3’UTR length, the significance of this modulation is uncertain, however it may be related to recent evidence that pachytene piRNAs, including the subset derived from the 3’ UTRs of mRNAs, can contribute to degradation of mRNAs and retrotransposon transcripts during spermatogenesis [10–12].
Although our dataset overlapped with 3’ UTR piRNA clusters identified by reanalysis of previously published piRNA datasets [16, 17], due to the general lack of saturation in these datasets there were also many 3’UTR clusters unique to each dataset. By combining the three datasets, we annotated 3771 genes as producing 3’ UTR piRNA clusters in the adult mouse testis. Of these genes, 3721 were eligible for function and pathway analysis with IPA, which identified transcription and modification of protein as the top functions enriched in the combined gene set (Additional file 1: Table S4). These are consistent with the functional categories identified in previous reports [16, 17]. However, by analyzing the combined datasets, certain canonical pathways gained significance. One example is the FLT3 Signaling in Hematopoietic Progenitor Cells pathway (24 of 72 genes in the pathway had 3’ UTR piRNA clusters, p = 9.47 e-6), a pathway that stimulates the proliferation of stem cells and progenitor cells .
The evidence we have found is consistent with the interpretation that piRNAs and smaller somatic RNAs are derived from extended 3’UTRs, implying that their production may be connected in some way to regulation of 3’UTR length. The 3’ UTR is critical for regulating mRNA stability, localization, and translation (reviewed in ). Part of this regulation is influenced by 3’ UTR length, which has been shown to vary with cellular proliferation and reprogramming [23, 24]. Shorter 3’ UTR isoforms are generally expressed in cells with higher proliferation, such as iPS cells, cancer cells, and the testis [24–26], with a shift toward longer 3’ UTR expression as cells differentiate during embryogenesis . mRNAs with these shorter 3’ UTR isoforms have increased stability and protein expression , suggesting that shortening of 3’ UTRs in testis, perhaps involving the production of piRNAs, could lead to increased expression of the associated genes. The evidence that extended 3’UTRs, and small RNAs derived from them, are also present in somatic tissues implies that this mechanism may be broadly active, but play a much more prominent role in differentiation of spermatocytes. Recent work has indicated that pachytene testis piRNAs are involved in degradation of retrotransposon transcripts and mRNAs [10–12]; it remains to be established how these findings relate to those described here. We do not posit a causal relationship between the small RNAs we describe and the shortening of 3’UTRs, because our evidence does not directly demonstrate such a relationship.
Perhaps the most intriguing finding of this study is that small RNAs aligning to extended 3’UTRs are also found in somatic tissues, and that a subset of these somatic 3’UTR clusters is absent in mice lacking a functional Miwi2 (Piwil4) gene. The small RNAs aligning to extended 3’UTRs in somatic tissues differ from canonical piRNAs: they are shorter, lack the predominant 5’ uracil, and much less abundant than their testis counterparts. We assessed the role of MIWI2 in these clusters because we and others have found piRNAs in somatic tissues and cancer cells [19, 27], and we find low levels of Miwi2 mRNA in somatic tissues (Fig. 5a). No phenotype except male sterility has previously been noted in Miwi2 −/− mice [9, 20, 27, 28], although we have found that knockdown of Miwi2 expression results in differentiation of mouse erythroleukemia (MEL) cells . The absence of somatic xUTR clusters is in effect a phenotype in Miwi2 −/− mice, but any consequences of this absence must be subtle: we find no apparent change in expression of the genes associated with these UTRs in Miwi2 −/− mice, and no other effects have been reported. MIWI2 has been shown to associate with piRNAs in the prenatal testis, and to mediate transcriptional silencing and methylation of retrotransposons at this stage of development [7, 9]. If it is involved in the generation or maintenance of small RNAs derived from 3’UTRs in somatic cells, this would imply some very different function, perhaps related to the recently demonstrated role of MIWI degradation of RNA transcripts [10–12].
Using a very deep dataset of piRNAs from mature mouse testis, we show that clusters of piRNAs mark extended 3’UTRs that have not been previously annotated. These clusters are present on many mRNAs expressed in the testis, but also in somatic cells. A subset of somatic 3’UTR-derived small RNA clusters are dependent on MIWI2. MIWI2 is a Piwi protein that has been shown to associate with piRNAs in the prenatal testis, and to mediate transcriptional silencing and methylation of retrotransposons at this stage of development. Its potential involvement in the generation or maintenance of small RNAs derived from 3’UTRs in somatic cells would imply some very different function. These findings extend the biological scope of MIWI2 function, and also indicate the existence of a class of extended 3’UTRs that are processed into small RNAs.
Generation and sequencing of small RNA libraries
For testis piRNAs, 10 μg of total RNA from adult mouse testes was electrophoresed on an 18 % denaturing polyacrylamide gel with NEB miRNA markers, the gel stained with SYBR Gold, and the readily visible band migrating at 29 nt (range ~27-31 nt) excised. RNA was eluted from the gel fragment and purified by standard methods. Illumina libraries were constructed from RNA specimens using the Illumina Small RNA Library kit, following the manufacturer’s protocol. Libraries were sequenced on an Illumina GAII apparatus to collect reads of 36 nucleotides. Raw data was processed with the Illumina pipeline v1.3.2. Data are deposited into the GEO database with accession number GSE26251.
For deep sequencing of somatic small RNAs, liver and spleen were harvested from WT and Miwi2 −/− mice  and flash frozen in liquid nitrogen (all animal studies were carried out in accordance with CHORI’s Institutional Animal Care and Use Committee regulations). Frozen tissue was homogenized using a Tissue Tearor homogenizer (Biospec Products) and small RNAs (10 – 200 bases) were purified using RNAzol RT (Molecular Research Center, Inc.). Purified RNAs were run on an 18 % denaturing PAGE, the region spanning ~22-35 nt was excised, and RNAs were purified for deep sequencing. RNA libraries were produced as above, with the exception that the Illumina small RNA 3’ adapter v1.5 was used; this adapter is modified by 5’ adenylation. Data are deposited into the GEO database with accession number GSE47093.
Data processing of small RNAs
Adapter sequences were removed from the 3’ ends of raw reads; up to 25 % mismatch was allowed between the adaptor and read sequences. Exact duplicate sequences were collapsed (only a single representative read was retained). Known contaminants and small RNAs were removed by alignment with Bowtie , allowing for up to 3 mismatches. These contaminants and small RNAs included 3’ and 5’ adapter sequences; rRNA (GenBank), tRNA [30–34], and miRNA [35–37] downloaded from the UCSC Genome Browser database ; miRNAs from miRBase Release 17 [35, 36, 39, 40]; and snoRNA, snRNA and miRNA downloaded from Ensembl release 65 .
Comparing small RNAs to published datasets
Published small RNA sequences were downloaded from the NCBI Gene Expression Omnibus (GEO) , including MILI and MIWI immunoprecipitated and total small RNA from adult mouse testis (GSM475279-GSM475281; ) and small RNA sequences from pachytene spermatocytes, round spermatids, and type A spermatogonia (GSM610965-GSM610967;). Similarity indices, the number of sequences in the intersection of the two datasets divided by the number of sequences in the union, were calculated as in . Sequence data from these published sets were reprocessed following the same methods we used to analyze our dataset.
Small RNA alignment and annotation
Cleaned RNAs were aligned to the NCBI37/mm9 assembly of the mouse genome using Bowtie  allowing up to one mismatch and no multiple alignments. Sequences were classified as falling within a genomic feature if at least 50 % of the read length overlapped that feature (BEDTools, ). Annotation tracks were downloaded from the UCSC Genome Browser database (, November 2010), including those for repetitive elements (RepeatMasker ), mRNA (RefSeq exon [45, 46], and introns (RefSeq intron [45, 46]. RNAs were classified as being intergenic if they did not intersect any of the above tracks. Small RNAs that overlapped multiple tracks were classified exclusively in the order of repetitive element > mRNA > intron > intergenic.
Generation of piRNA clusters
In the testis dataset, we used the feature density estimator F-Seq  to identify genomic regions significantly enriched for aligned piRNAs, using a feature length of 1000 and a threshold of 15 standard deviations. To generate 3’ UTR piRNA clusters, we extended the significantly enriched regions by 1 kb if there were 2 piRNA per 1 kb extension and merged overlapping regions. The boundaries of the clusters were then shrunk to the locations of actual aligned piRNAs, and clusters were selected for further analysis if they overlapped a RefSeq 3’ UTR (excluding non-coding genes) by any amount. Unidirectional clusters with over 60 % of reads aligning to a give strand were then selected, and the gene with the greatest 3’ UTR overlap in the sense direction was assigned as the source. The 5’ ends of the clusters were trimmed to match the annotated 3’ UTR, and the boundaries of the clusters were then trimmed to the location of sense piRNAs. If the 3’ end overlapped an adjacent gene, the overlapping portion was removed. However, 3’ UTR clusters that completely overlapped another gene were retained.
To generate other piRNA clusters, we started with the F-Seq defined regions above. We then expanded these regions by 500 bp if there was 1 piRNA in a 500 bp window. As intergenic piRNA clusters are known to span regions with repetitive elements, which would not contain our uniquely aligned reads, we then merged the clusters with the annotated RepeatMasker track elements within 50 bp. We then expanded the clusters by 500 bp windows if they contained 1 piRNA and checked the clusters that had expanded by more than 20 % of their length. We separated clusters if two clusters with reads aligning to different strands had merged. We then shrunk the cluster boundaries to the locations of aligned piRNAs. Clusters were annotated based on the presence of 50 % or more of their contained sequences within a given annotation track (coding exon, UTR, intron). Clusters with fewer than 50 % of sequences aligning to a genic track were classified as intergenic.
In the liver and spleen small RNA datasets, we identified 3’ UTR clusters by first identifying genes where the density of sense small RNAs in the 3’ UTR was at least two fold that of the density in the coding region. We then selected 3’ UTRs with at least 3 small RNAs and a density of over 7.5 reads per kilobase 3’ UTR per million reads aligned. We set the cluster boundaries as the region of the largest 3’ UTR isoform and extended the 3’ boundary if there was at least 1 sense small RNA per 2 kb.
3’ UTR cluster comparison and functional analysis
Genes with 3’ UTR piRNA clusters identified in our testis dataset, Robine et al. , and Gan et al. , were analyzed with IPA (Ingenuity Systems, www.ingenuity.com). IPA core analysis was conducted using the default parameters. The Functional Analysis identified the biological functions that were most significant to the combined data set. A right-tailed Fisher’s exact test was used to calculate a p-value determining the probability that each biological function assigned to that data set is due to chance alone.
When analyzing the spleen and liver datasets, enrichment of specific gene ontology terms for genes with clusters of small RNAs overlapping their 3’ UTRs was determined using the Functional Annotation tool of the DAVID Bioinformatics Resources [48, 49].
Antisense reads in 3’ UTR clusters
To analyze antisense reads in 3’ UTR piRNA clusters, we first aligned the small RNA sequences to the genome allowing up to two mismatches and up to 25 alignments. Sites of 10-bp overlap were defined as regions within a 3’ UTR cluster where the first 10 bases of sequences aligned to the plus and minus strands overlapped each other. To determine the most likely genomic source of sequences aligning antisense within the 3’ UTR clusters, these sequences were extracted and re-aligned to the genome under strict conditions (up to one mismatch, unique alignments only). These positions were then annotated based on their presence in an original 3’ UTR cluster, an intergenic piRNA cluster (see above), a different piRNA cluster, or any other location.
Extended 3’ UTR cluster selection and expression
3’ UTR clusters were selected that extended beyond the annotated 3’ UTR by more than 30 % of the length of the cluster. Clusters that completely overlapped an annotated gene, in the sense or antisense direction, were removed from this analysis. For the expression analysis, only the portion of the cluster that did not overlap the annotated 3’ UTR was considered.
To calculate the expression of the unannotated portion of the extended 3’ UTR cluster precursors, we used mouse testis RNA-Seq data  downloaded from the NCBI Sequence Read Archive (ww.ncbi.nlm.nih.gov/sra, accession number SRR036361). We aligned these reads to the mouse genome (assembly NCBI37/mm9) using Bowtie  allowing up to one mismatch and no multiple alignments. We used BEDTools  to isolate aligned reads that intersected the regions of interest in the sense direction and calculated the expression of these regions by dividing the number of reads in a region by the length of the region and the number of aligned reads and multiplying by 109. We then used BEDTools  to generate a set of shuffled intergenic regions of the same size as the unannotated portion of the extended 3’ UTRs, maintaining the same chromosome coverage. Using these regions as background for RNA-Seq expression, we chose a cut-off for positive expression at which 95 % of random intergenic regions are expressed at a lower level.
Expression of the RefSeq mRNA for genes with extended 3’ UTR clusters was calculated as for the annotated portion of the clusters. To calculate expression of the unannotated portion of extended 3’ UTR piRNA precursors in somatic tissues, we first downloaded RNA-Seq data for spleen, lung, liver, kidney, heart, cortex, cerebellum and bone marrow from the ENCODE project (Ren LICR-m group, , DCC_Accession wgEncodeEM001706, wgEncodeEM001709-1715). We aligned the sequences and calculated the tissue-specific expression as above.
Gene expression was calculated in spleen and liver in the same fashion. RNA-Seq data for spleen and liver were downloaded from the ENCODE project (Ren LICR-m group, , DCC_Accession wgEncodeEM001709 and wgEncodeEM001714, respectively). The density of aligned reads was calculated for mRNA (RefSeq exon [45, 46] as above.
Northern blot analysis
Mouse testis, brain, spleen and liver were harvested and flash-frozen in liquid nitrogen. Tissues were homogenized in TRIzol Reagent (Invitrogen) using a Tissue Tearor homogenizer (Biospec Products) and total RNA was purified according to the manufacturer’s instructions. mRNA was purified from total RNA using a polyA Spin mRNA Isolation Kit (New England Biolabs) according to manufacturer’s instructions. Northern blotting used the Ambion NorthernMax kit (Invitrogen) following the manufacturer’s instructions. 1 % agarose gels were run with 1 ug mRNA per well and transferred to Ambion BrightStar-Plus membrane (Invitrogen) using downward capillary transfer. RNA was then cross-linked to the membrane using a UV Stratalinker 1800 (Stratagene) and stored at −20 °C. Cross-linked membranes were hybridized to radiolabeled single-stranded DNA probe overnight at 42 °C, washed, and exposed to a phosphorscreen for one week. Phosphorscreens were scanned using a Storm 840 phosphorimager (Molecular Dynamics), and images were analyzed using ImageQuant 5.0.
Template for the Northern blot probe was created by first purifying liver DNA using TRIzol Reagent (Invitrogen) according to the manufacturer’s instructions. Liver DNA was then used as a template for PCR amplification of the probe sequences for Pdpr, using primers NblotEx-F and NblotEx-R for the exon-specific probe and NblotUTR-F and NblotUTR-R for the extended 3’ UTR-specific probe (Additional file 3: Table S7). PCR products were purified using a QIAquick PCR purification kit (Qiagen) according to the manufacturer’s instructions.
Primer extension was then used to make radiolabeled single-stranded DNA probes. Radiolabeled probes were purified using SigmaSpin™ Sequencing Reaction Clean-up, Post-reaction Clean-up Columns (Sigma-Aldrich) according to the manufacturer’s instructions.
RT-PCR of extended 3’ UTR clusters
Total RNA was purified as in the Northern blot analysis, and contaminating genomic DNA was removed with the RNase-Free DNase Set (Qiagen) according to the manufacturer’s instructions. RNA was then cleaned with the RNeasy Mini Kit (Qiagen) RNA cleanup protocol. cDNA was synthesized using the ProtoScript™ M-MuLV Taq RT-PCR Kit (New England BioLabs) according to the manufacturer’s instructions and used as template in the RT-PCR reactions. See Additional file 3: Table S7 for primer sequences, annealing temperatures and extension times. RT-PCR products were run on 1 % agarose gels stained with ethidium bromide and scored as either expressing or not expressing the transcript.
Some 3’ UTR small RNA clusters are tissue-specific
Since our somatic small RNA datasets were so much smaller than the testis dataset, we used a different approach for identifying 3’ UTRs associated with small RNAs. In the liver and spleen small RNA datasets, we identified 3’ UTR clusters by first identifying RefSeq genes where the density of sense small RNAs in the 3’ UTR was at least two fold that of the density in the coding region. We then selected 3’ UTRs with at least 3 small RNAs and a density of over 7.5 reads per kilobase 3’ UTR per million reads aligned. We set the cluster boundaries as the region of the largest 3’ UTR isoform and extended the 3’ boundary if there was at least 1 sense small RNA per 2 kb.
Determining MIWI2 dependent and independent somatic clusters
To determine which somatic 3’ UTR smRNA clusters were MIWI2 dependent, we first calculated the density of small RNAs in each 3’ UTR cluster region using datasets from WT and Miwi2 −/− mice as above. If a cluster region had small RNAs in both the WT and Miwi2 −/− datasets, it was determined to be MIWI2 independent. Clusters without any small RNAs in the Miwi2 −/− dataset, and in the bottom quartile of density in the WT dataset, were filtered because they were too small to determine MIWI2 dependence. The remaining clusters, in the top quartiles of small RNA expression in the WT dataset and lacking small RNA alignments from the Miwi2 −/− dataset, were called as MIWI2 dependent clusters.
For quantitative RT-PCR of somatic expression of Miwi2 and of mRNA derived from genes with somatic 3’ UTR small RNA clusters, cDNA was purified as for “RT-PCR of extended 3’ UTR clusters” above. PCR reactions were carried out using FastStart Universal SYBR Green Master (Roche) on an ABI 7900 Real-Time PCR machine according to the manufacturer instructions. All reactions were run in duplicate (quadruplicate for Miwi2 expression), and the mean CT was used for downstream analysis. Expression values were calculated using a standard ladder of pooled cDNA, and normalized for GAPDH expression in that cDNA prep. Values shown are the average and standard deviation obtained from two different mice.
Availability of supporting data
The data sets supporting the results of this article are available in the NCBI GEO repository, with accession numbers GSE26251 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26251) and GSE47093 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47093).
We thank James Jacobs for materials from the Miwi2 −/− mice, and Cath Suter for helpful discussions. The work was supported by grant HL084474 (DB). JY was supported by the T32 Training Grant HL098057. JD was a Scholar of the California Institute of Regenerative Medicine. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 Instrumentation Grants S10RR029668 and S10RR027303.
- Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006;442(7099):203–7.PubMedGoogle Scholar
- Girard A, Sachidanandam R, Hannon GJ, Carmell MA. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature. 2006;442(7099):199–202.PubMedGoogle Scholar
- Grivna ST, Beyret E, Wang Z, Lin H. A novel class of small RNAs in mouse spermatogenic cells. Genes Dev. 2006;20(13):1709–14.PubMed CentralPubMedView ArticleGoogle Scholar
- Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, et al. Characterization of the piRNA complex from rat testes. Science. 2006;313(5785):363–7.PubMedView ArticleGoogle Scholar
- Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007;128(6):1089–103.PubMedView ArticleGoogle Scholar
- Gunawardane LS, Saito K, Nishida KM, Miyoshi K, Kawamura Y, Nagami T, et al. A slicer-mediated mechanism for repeat-associated siRNA 5’ end formation in Drosophila. Science. 2007;315(5818):1587–90.PubMedView ArticleGoogle Scholar
- Aravin AA, Sachidanandam R, Bourc’his D, Schaefer C, Pezic D, Toth KF, et al. A piRNA pathway primed by individual transposons is linked to de novo DNA methylation in mice. Mol Cell. 2008;31(6):785–99.PubMed CentralPubMedView ArticleGoogle Scholar
- Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N, et al. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature. 2008;455(7217):1193–7.PubMedView ArticleGoogle Scholar
- Kuramochi-Miyagawa S, Watanabe T, Gotoh K, Totoki Y, Toyoda A, Ikawa M, et al. DNA methylation of retrotransposon genes is regulated by Piwi family members MILI and MIWI2 in murine fetal testes. Genes Dev. 2008;22(7):908–17.PubMed CentralPubMedView ArticleGoogle Scholar
- Gou L-T, Dai P, Yang J-H, Xue Y, Hu Y-P, Zhou Y, et al. Pachytene piRNAs instruct massive mRNA elimination during late spermiogenesis. Cell Res. 2014;24(6):680–700.PubMed CentralPubMedView ArticleGoogle Scholar
- Watanabe T, Cheng E-c, Zhong M, Lin H. Retrotransposons and pseudogenes regulate mRNAs and lncRNAs via the piRNA pathway in the germline. Genome Res. 2015;25(3)):368–80.PubMedView ArticleGoogle Scholar
- Zhang P, Kang JY, Gou LT, Wang J, Xue Y, Skogerboe G, et al. MIWI and piRNA-mediated cleavage of messenger RNAs in mouse testes. Cell Res. 2015;25(2):193–207.PubMedView ArticleGoogle Scholar
- Li XZ, Roy CK, Dong X, Bolcun-Filas E, Wang J, Han BW, et al. An ancient transcription factor initiates the burst of pirna production during early meiosis in mouse testes. Mol Cell. 2013;50(1):67–81.PubMed CentralPubMedView ArticleGoogle Scholar
- Vourekas A, Zheng Q, Alexiou P, Maragkakis M, Kirino Y, Gregory BD, et al. Mili and Miwi target RNA repertoire reveals piRNA biogenesis and function of Miwi in spermiogenesis. Nat Struct Mol Biol. 2012;19(8):773–81.PubMed CentralPubMedView ArticleGoogle Scholar
- Aravin AA, Hannon GJ, Brennecke J. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science. 2007;318(5851):761–4.PubMedView ArticleGoogle Scholar
- Robine N, Lau NC, Balla S, Jin Z, Okamura K, Kuramochi-Miyagawa S, et al. A broadly conserved pathway generates 3’UTR-directed primary piRNAs. Curr Biol. 2009;19(24):2066–76.PubMed CentralPubMedView ArticleGoogle Scholar
- Gan H, Lin X, Zhang Z, Zhang W, Liao S, Wang L, et al. piRNA profiling during specific stages of mouse spermatogenesis. RNA. 2011;17(7):1191–203.PubMed CentralPubMedView ArticleGoogle Scholar
- Cui P, Lin Q, Zhang L, Ding F, Xin C, Zhang D, et al. The disequilibrium of nucleosomes distribution along chromosomes plays a functional and evolutionarily role in regulating gene expression. PLoS One. 2011;6(8), e23219.PubMed CentralPubMedView ArticleGoogle Scholar
- Yan Z, Hu HY, Jiang X, Maierhofer V, Neb E, He L, et al. Widespread expression of piRNA-like molecules in somatic tissues. Nucleic Acids Res. 2011;39(15):6596–607.PubMed CentralPubMedView ArticleGoogle Scholar
- Carmell MA, Girard A, van de Kant HJ, Bourc’his D, Bestor TH, de Rooij DG, et al. MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germline. Dev Cell. 2007;12(4):503–14.PubMedView ArticleGoogle Scholar
- Juliano C, Wang J, Lin H. Uniting germline and stem cells: the function of Piwi proteins and the piRNA pathway in diverse organisms. Annu Rev Genet. 2011;45:447–69.PubMedView ArticleGoogle Scholar
- Di Giammartino DC, Nishida K, Manley James L. Mechanisms and consequences of alternative polyadenylation. Mol Cell. 2011;43(6):853–66.PubMed CentralPubMedView ArticleGoogle Scholar
- Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. Proliferating cells express mRNAs with shortened 3’ untranslated regions and fewer microRNA target sites. Science. 2008;320(5883):1643–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Ji Z, Tian B. Reprogramming of 3’ untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One. 2009;4(12), e8419.PubMed CentralPubMedView ArticleGoogle Scholar
- Mayr C, Bartel DP. Widespread shortening of 3’UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009;138(4):673–84.PubMed CentralPubMedView ArticleGoogle Scholar
- Smibert P, Miura P, Westholm JO, Shenker S, May G, Duff MO, et al. Global patterns of tissue-specific alternative polyadenylation in Drosophila. Cell Rep. 2012;1(3):277–89.PubMed CentralPubMedView ArticleGoogle Scholar
- Jacobs JE, Wagner M, Dhahbi J, Boffelli D, Martin DI. Deficiency of MIWI2 (Piwil4) induces mouse erythroleukemia cell differentiation, but has no effect on hematopoiesis in vivo. PLoS One. 2013;8(12):e82573.PubMed CentralPubMedView ArticleGoogle Scholar
- Nolde MJ, Cheng EC, Guo S, Lin H. Piwi genes are dispensable for normal hematopoiesis in mice. PLoS One. 2013;8(8):e71950.PubMed CentralPubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Bio. 2009;10(3):R25.View ArticleGoogle Scholar
- Pavesi A, Conterio F, Bolchi A, Dieci G, Ottonello S. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res. 1994;22(7):1247–56.PubMed CentralPubMedView ArticleGoogle Scholar
- Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22(11):2079–88.PubMed CentralPubMedView ArticleGoogle Scholar
- Fichant GA, Burks C. Identifying potential tRNA genes in genomic DNA sequences. J Mol Biol. 1991;220(3):659–71.PubMedView ArticleGoogle Scholar
- Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.PubMed CentralPubMedView ArticleGoogle Scholar
- Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37(Database issue):D93–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34(34(Database issue)):D140–4.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36(Database issue):D154–8.PubMed CentralPubMedGoogle Scholar
- Weber MJ. New human and mouse microRNA genes found by homology search. FEBS J. 2005;272(1):59–73.PubMedView ArticleGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, et al. The UCSC genome browser database: update 2011. Nucleic Acids Res. 2011;39(Database issue):D876–82.PubMed CentralPubMedView ArticleGoogle Scholar
- Griffiths-Jones S. The microRNA registry. Nucleic Acids Res. 2004;32(Database issue):D109–11.PubMed CentralPubMedView ArticleGoogle Scholar
- Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39(Database issue):D152–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, et al. Ensembl 2012. Nucleic Acids Res. 2012;40(Database issue):D84–90.PubMed CentralPubMedView ArticleGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011;39(Database issue):D1005–10.PubMed CentralPubMedView ArticleGoogle Scholar
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.PubMed CentralPubMedView ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4):462–7.PubMedView ArticleGoogle Scholar
- Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64.PubMed CentralPubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005;33(Database issue):D501–4.PubMed CentralPubMedView ArticleGoogle Scholar
- Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008;24(21):2537–8.PubMed CentralPubMedView ArticleGoogle Scholar
- da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.View ArticleGoogle Scholar
- da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13.PubMed CentralView ArticleGoogle Scholar
- Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE, et al. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011;9(4):e1001046.View ArticleGoogle Scholar
- Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.PubMed CentralPubMedView ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.