Genome-wide characterization of human L1 antisense promoter-driven transcripts
© The Author(s). 2016
Received: 6 January 2016
Accepted: 26 May 2016
Published: 14 June 2016
Long INterspersed Element-1 (LINE-1 or L1) is the only autonomously active, transposable element in the human genome. L1 sequences comprise approximately 17 % of the human genome, but only the evolutionarily recent, human-specific subfamily is retrotransposition competent. The L1 promoter has a bidirectional orientation containing a sense promoter that drives the transcription of two proteins required for retrotransposition and an antisense promoter. The L1 antisense promoter can drive transcription of chimeric transcripts: 5’ L1 antisense sequences spliced to the exons of neighboring genes.
The impact of L1 antisense promoter activity on cellular transcriptomes is poorly understood. To investigate this, we analyzed GenBank ESTs for messenger RNAs that initiate in the L1 antisense promoter. We identified 988 putative L1 antisense chimeric transcripts, 911 of which have not been previously reported. These appear to be alternative genic transcripts, sense-oriented with respect to gene and initiating near, but typically downstream of, the gene transcriptional start site. In multiple cell lines, L1 antisense promoters display enrichment for YY1 transcription factor and histone modifications associated with active promoters. Global run-on sequencing data support the activity of the L1 antisense promoter. We independently detected 124 L1 antisense chimeric transcripts using long read Pacific Biosciences RNA-seq data. Furthermore, we validated four chimeric transcripts by quantitative RT-PCR and Sanger sequencing and demonstrated that they are readily detectable in many normal human tissues.
We present a comprehensive characterization of human L1 antisense promoter-driven transcripts and provide substantial evidence that they are transcribed in a variety of human cell-types. Our findings reveal a new wide-reaching aspect of L1 biology by identifying antisense transcripts affecting as many as 4 % of all human genes.
Our genome is replete with L1 retrotransposon-derived sequences that can affect the transcriptome [1–3]. Genomic L1 sequences propagate through RNA intermediates. Their lifecycle begins with transcription from the 5’ L1 promoter; this is followed by reverse transcription of the L1 RNA and insertion of the L1 cDNA sequence into the genome [1, 2]. There are two broad classes of L1s. First is the less numerous class of full-length (~6 kb) L1 retrotransposon sequences with intact internal promoters, of which there are approximately 7,000 copies in the human genome . A subset of these with intact coding sequences for open reading frame 1 protein (ORF1p) and ORF2p is potentially retrotransposition competent, and these elements are mostly specific to our species (L1HS, or L1PA1 elements) . Second is the much more numerous and heterogeneous class of mutated or truncated L1, which includes both species-specific and ancient insertions. These insertions included 5’ truncations with or without internal rearrangements at the time of their insertion. Depending on their length and mutation load these sequences may have no promoter activity or protein-coding capability [1, 2, 6].
It is estimated that there are approximately 90 full-length, retrotransposition competent L1HS in the human genome with intact internal promoters and open-reading frames . Additionally, 29 of the 362 full-length older L1PA2 are potentially active . The estimates are based on the length of the L1 insertions, the integrity of their open reading frame sequences, and results of in vitro retrotransposition assays that may use an ectopic promoter to drive L1 expression. However, sequence requirements for L1 promoter activity have not been well defined.
Activity of the L1 antisense promoter (ASP) was first demonstrated by Speek who described the discovery of four chimeric L1 ASP transcripts, comprised of L1-derived 5’-UTR and spliced exons from neighboring single-copy genes . Speek and colleagues also showed that L1 ASP activity can drive tissue-specific transcription of chimeric transcripts in a few instances [8, 9]. Subsequently, much work focused on the MET oncogene locus and the associated L1-MET chimeric transcript . The L1-MET chimeric transcript initiates within the second intron of the MET gene and downstream of the translational start site. Overexpression of the L1-MET chimeric transcript causes decreased full-length MET protein levels and MET-dependent signaling perhaps through transcriptional interference . Similarly, transcriptional derepression of the L1 chimeric transcript LCT13 was linked to silencing of its cognate transcript TFPI-2, a tumor suppressor in a variety of human malignancies .
In another approach to identify chimeric L1 ASP transcripts, Cruickshanks and Tufarelli applied L1 chimera display to identify eighteen novel chimeric L1 ASP transcripts, some of which were selectively detected in breast and colon cancer specimens but not in matched normal tissues . These investigators also showed that DNA methylation limits the activity of L1 ASP in normal tissues and that 5-aza-cytidine treatment of established cancer cell lines causes expression of chimeric L1 ASP transcripts . Other studies have found evidence of antisense expression in more ancient L1 elements. Macia et al. found that more ancient primate L1 elements including L1PA2-10 were capable of antisense transcription . Faulkner et al. identified transcription start sites of cap-selected RNAs and found that L1 fragments displayed antisense expression at the 3’ end of the L1 element .
The human-specific subfamily L1HS and the primate specific subfamilies L1PA2-8 were identified to contain an open reading frame termed ORF0 that is transcribed downstream of the L1 ASP on the antisense strand . The ORF0 is typically translated as a short peptide and locates to promyelocytic leukemia-adjacent nuclear bodies in the cytoplasm . There are ~3200 genic loci encoding L1 ORF0, which is typically encoded completely within the L1 ASP sequence. A rare fraction, 57 of ~3200 L1 ORF0 loci, encodes a peptide where ORF0 was observed to be fused to a gene exon .
Although the existence of L1 ASP transcripts is well-documented, many questions remain. The extent to which L1 chimeric antisense transcripts impact the human transcriptome and what other genes may be under the influence of L1 ASP is unknown. The importance of these L1 ASP transcripts is further highlighted by the recent identification of ORF0, which suggests chimeric L1 gene fusion transcripts might be expressed. Herein, we report the identification and comprehensive characterization of 988 putative L1 ASP transcripts.
Identification of many novel L1 ASP transcripts
Properties of L1 ASP transcripts
Next, we reviewed the primary tissues (1531 ESTs) and cell lines (355 ESTs) that were sources of ESTs, and categorized them as cancerous (578 ESTs) or normal (1308 ESTs) (Table S1). Some ESTs could not be characterized due to a lack of supporting information. Interestingly, 69.4 % of ESTs supporting an L1 ASP transcript originated from non-cancerous tissues (Fig. 2b). For ESTs derived from cancerous tissue, the most prevalent sources were embryonic carcinoma cells (NT2 testis embryonal carcinoma cell line), head/neck tissue (tongue tumor), and fibrosarcoma cells (HT1080 fibrosarcoma cell-line) (Fig. 2c). L1 ASP transcripts in normal tissue were primarily identified from brain or testis tissues. Our analysis supports the notion that L1 ASP transcripts are readily detectable in both normal and diseased states. However, due to source tissue biases in the EST database, we cannot conclude the L1 ASP promoter activity is higher overall in the brain or testis. Finally, we conducted a gene ontology (GO) analysis of the 988 cognate genes that contained putative L1 ASP transcripts. The results of our GO analysis showed overrepresentation of genes involved in diverse cellular processes including vesicle-mediated transport, intracellular protein transport, mitosis, morphogenesis, and protein modifications (Additional file 3: Table S2).
To further validate L1 ASP transcripts we examined publicly available long-read RNA-seq data from human embryonic stem cells (H1-ESCs) sequenced using a Pacific Biosciences (PacBio) instrument . The RNA-seq reads generated by PacBio are long (averaging 2–3 kb length), but are lower quality. Therefore, Au K. et al. error corrected the long-read PacBio RNA-seq reads using short read Illumina RNA-seq data . Subsequently, the error corrected long RNA-seq data was used for isoform detection by Au K. et al. to identify expressed transcripts in H1-ESCs . We used the publicly available transcript isoform prediction for H1-ESCs to validate L1 ASP transcripts identified by our EST screen. We were able to validate 124/988 or 12.6 % of the L1 ASP transcripts reported here as expressed in H1-ESCs (Fig. 3b, Additional file 1: Table S1). Thus, using low coverage RNA-seq data from a single cell-line we were able to independently validate 12.6 % of the L1 ASP transcripts we identified.
Next we sought to address the coding potential of L1 ASP transcripts because the majority was sense to the cognate gene and could potentially contain open reading frames (ORFs). We applied TransDecoder software to identify putative peptides encoded by ESTs with at least 100 amino acids (aa) . We also required that the predicted putative peptide begin with a start codon encoding methionine and identified that 546/2015 or 27.1 % of the ESTs contained the potential for coding putative peptides (Fig. 3b). The identified putative peptides ranged from 100 to 268 aa, with an average putative peptide length of 140 aa (Additional file 4: Table S3, Fig. 3b). Of the identified putative peptides 42 contained fragments of the recently identified ORF0 open reading frame protein . However, because we only included spliced ESTs and ORF0 is contained within the unspliced 5’ UTR of L1HS we did not identify a full-length ORF0 . However, it is important to note that in the absence of additional experimental data these are only peptide predictions. In summary, L1 ASP transcripts are predicted to contain putative peptides that extend from the L1 ASP into cognate genes.
Characterization of the L1 ASP
We further interrogated ESTs from recent primate L1PA2-8 subfamilies and the human L1HS subfamily for transcription factors that might drive transcription from the ASP . YY1 is the best characterized cis-regulatory transcription factor proposed to be required for L1 transcription from the sense promoter . YY1 was previously reported to bind between 13 and 21 bp of L1HS 5’ UTR on the antisense strand [23–26]. Interestingly, YY1 is also an important regulator of bidirectional transcription at the promoters of multiple single-copy genes [27, 28]. To examine the possibility that YY1 could play a role in the L1 ASP, we sought to better characterize YY1 binding to the L1HS 5’ UTR. First, we applied JASPAR Scan to examine the 5’ UTR of the L1HS consensus sequence for YY1 transcription factor-binding sites using a relative score of at least 90 % . We identified five putative binding sites based on YY1 position weight matrices, including two that overlapped the proposed binding site (13 to 21 bp) and three additional binding sites (Additional file 5: Table S4). Next, we examined YY1 transcription factor binding to the L1HS consensus sequence using ChIP-seq data available for YY1 from the ENCODE project . For all of the ENCODE cell lines, we identified two peaks that displayed enrichment for YY1: the first, Peak-1, corresponded to the known YY1 binding site at ~20 bp; the second, Peak-2, was at ~450 bp in the L1HS 5’ UTR (Fig. 4b). Peak-2 overlaps closely with an identified putative YY1 binding site on the positive strand at 448 to 453 bp, which immediately precedes the primary L1 ASP at ~400 bp. (Fig. 4a-b). The secondary binding site of YY1 near the primary L1 ASP suggests that YY1 may play a role in regulating L1 antisense transcription.
To test whether TSS locations of L1 ASP transcripts were bound by YY1 and marked by histone modifications, we examined ENCODE ChIP-seq alignments to the human genome (build hg19). We reviewed ChIP data for YY1, H3K4me2, and H3K4me3 histone modifications in the K562, H1-ESC, and Hela cell-lines . The histone modifications H3K4me2 and H3K4me3 are typically associated with euchromatic, active or poised promoters [31–34]. We found that the ESTs supporting L1 ASP transcripts displayed modest enrichment for YY1 at the TSS in ENCODE (Fig. 4c, Additional file 2: Figure S3). Peak-2 for YY1, identified in Fig. 4b, directly overlapped near the TSS of the ESTs supporting L1 ASP transcripts. Immediately downstream of the TSS, based on transcription orientation, we identified modest enrichment for histone marks H3K4me2 and H3K4me3 in K562, H1-ESC, and Hela cell-lines (Fig. 4c, Additional file 2: Figures S4A-B). We next examined global run-on sequencing (GRO-seq) data, a high-throughput nuclear run-on assay that can identify 5’-capped RNAs [35–37]. We found that GRO-seq sequencing reads were mapped to the TSS of the majority of L1 antisense ESTs in K562, MCF7, and Hela cell-lines (Fig. 4c, Additional file 2: Figures S4C-D). Together, this data supports that YY1 binds to the L1 ASP in addition to the previously characterized binding site near the sense promoter. Additionally, analysis of histone modifications associated with active or poised promoters and GRO-seq data suggests that the L1 ASPs we identified are likely to be actively transcribed in a variety of cell types.
Validation of L1 ASP transcripts
L1 ASP transcripts overlapping L1 exonized transcripts
The method to identify L1 ASP transcripts implemented a criterion to remove ESTs where an independent EST suggested L1 exonization (e.g., inclusion of an intronic L1 on the opposite strand within a normal gene transcript). We examined this category in more detail and report these putative transcripts separately (Additional file 8: Table S7). Many of these events cannot be unambiguously assigned as an L1 antisense transcript due to the presence of an EST also supporting L1 exonization. However, we observed transcripts where the majority of ESTs supported L1 ASP driven transcription rather than L1 exonization (including MAPK10, Additional file 2: Figure S5A). We also observed events where ESTs supporting L1 ASP driven transcripts and ESTs supporting L1 exonization were non-overlapping (including SCAMP1, Additional file 2: Figure S5B). Therefore, in some cases L1s might contribute both to L1 ASP driven transcripts and L1 exonized transcripts.
Identification of mouse L1 ASP transcripts
In the mouse genome LINE-1 elements are also active, yet they are quite different from the human L1HS subfamily at the sequence level. Nevertheless, active mouse LINE-1 elements contain an L1 ASP also capable of yielding fusion transcripts . Unlike the human L1HS subfamily, which contains the L1 ASP in the 5’ UTR, the mouse ASP is within the first open reading frame (ORF1p) . We applied our bioinformatic approach to identified spliced ESTs consistent with an L1 antisense transcript in the mouse genome. The results for the mouse are an underestimate because unlike human ESTs many of the mouse ESTs in the spliced alignment database did not contain transcript orientation. However, despite our strict filtering we identified L1 ASP transcripts for 174 cognate genes that were supported by 307 ESTs (Additional file 9: Table S8). Of the 174 identified L1 ASP transcripts 23 corresponded to the mouse-specific LINE-1 retrotransposon subfamilies (L1Mus1-4). Surprisingly, we again identified a subset of evolutionarily ancient L1 subfamilies that were transcribing an L1 antisense transcript, which we also identified in human. The L1 ASP transcript SHISA5 was identified within both the human and mouse genomes transcribed antisense to shared mammalian L1M5 (Additional file 2: Figure S6).
Here, we present one of the most comprehensive studies of L1 ASP transcripts in the human genome. We describe in total 988 chimeric transcripts of which 911 are novel, where the L1 ASP drives the expression of a transcript that is spliced to a gene exon; and they are collectively supported by 2015 ESTs. Because we required evidence of splicing at the alignment level, our identified L1 antisense ESTs correspond to processed transcripts. Thus, the number of putative transcripts is almost certainly an underestimate. This is supported by the fact that other published studies using complementary approaches identified 44 additional putative L1 ASP transcripts we did not recover.
The L1 sense promoter has higher transcriptional activity in many cancers, and increased L1 ORF1 expression and protein abundance is often observed [41, 42]. High levels of L1 somatic retrotransposition are also readily detectable in a variety of cancers and during the early stages of tumorigenesis [43–49]. We identified 20 L1 ASP driven transcripts that affect cancer genes, including previously reported L1-MET . While we expected to identify an increased number of L1 ASP transcripts from cancer specimens, the majority of ESTs found in this survey were identified in normal tissues. This is unlikely due to a bias in the EST database, which contains a preponderance of cDNAs from cancerous tissue and diseased states. Our analysis revealed that the brain, testis, placenta, embryonic tissues, and lungs were the most abundant contributors of ESTs supporting L1 antisense transcription.
About 27 % of the L1 ASP transcripts we describe occur in the sense orientation to the cognate gene and are predicted to produce a peptide of at least 100 amino acids in length. Whether L1 ASP transcripts yield translated proteins remains an open question. However, recent characterization of ORF0 and 57 instances of ORF0 gene exon fusions supports that a subset of transcripts identified here are likely translated . Interestingly, some L1 ASP transcripts seem to match an annotated gene transcript; in those cases the gene TSS starts in the L1 ASP, an example being UVRAG.
In contrast, about 73 % of the L1 ASP transcripts are not predicted to encode a putative peptide by our metric. The absence of a predicted putative peptide of 100 amino acids has been previously used to define a transcript as a long non-coding (lnc) RNA [50–52]. However, although atypical, some proteins with less than 100 amino acids reside in mammalian genomes . An important example of a short peptide that would not be identified by this analysis is ORF0, which is only 71 amino acids . Hence, our putative peptide prediction does not preclude the possibility that additional L1 ASP transcripts are potentially protein coding. In addition, because our identification is based on ESTs, which are not typically full-length transcripts, our current list might represent an underestimate of L1 ASP transcript putative predicted peptides.
We also describe L1 ASP transcripts incorporating gene sequences in the antisense orientation. Such transcripts are rare, comprising less than 8 % of our total, but may be biologically important. Where expressed, these have the potential to produce double-stranded RNAs because they encode a transcript with reverse complementarity to a portion of the cognate mRNA (for example, RABL2B in Additional file 1: Table S1). The dsRNA may impact epigenetic regulation at the locus and the stability of the mRNA .
Nearly all of the L1HS and L1PA2-8 antisense ESTs, subfamilies with a homologous 5’ UTR, display ASP activity within the first 600 bp of the L1 5’ UTR, whereas more ancient primate and mammalian L1s display the majority of ASP activity at the 3’ end of the element near the end of ORF2. This observation explains a previous result indicating that L1 fragments display ASP activity at the 3’ end of the L1 element . In addition, this result clarifies how the majority of ancient L1s, which are typically 5’ truncated, can contribute the majority of L1 ASP activity. We identified substantial evidence that L1 ASPs are transcriptionally active. There was binding of regulatory elements including the transcription factor YY1 at the L1 ASP. The transcription factor YY1 displayed a double peak binding distribution within the L1 promoter. YY1 transcription factor peak-1 in Fig. 4-c corresponds to the position of the L1 sense promoter and has been described previously . YY1 transcription factor peak-2 is newly identified and seems to overlap the position of the L1 ASP; however, binding of YY1 to peak-2 does not necessarily indicate a functional role. Unlike YY1 the GRO-seq and histone mark ChIP-seq profiles display a single peak distribution within the L1 promoter (Fig. 4c). The single GRO-seq peak overlaps YY1 peak-2 at the site which is likely the TSS for the L1 ASP. The predicted TSS position is also marked downstream by histone modifications H3K4me2 and H3K4me3 which is characteristic of active promoters. The GRO-seq profiles are consistent with 5’ capped RNAs initiating from this L1 ASP TSS. Whether the YY1 peak-2 is required for L1 ASP transcription at the identified TSS warrants further investigation. Independent validation of 124/988 L1 ASP transcripts using PacBio long read RNA-seq suggested expression of a large subset of identified transcripts in embryonic stem cells. Together, several lines of evidence indicate these transcription start sites function as active promoters.
The L1 ASP was likely active at multiple points in mammalian evolution. While ancient subfamilies are no longer competent for retrotransposition they are contributing to the transcriptome through promoter activity. Similarly, in the mouse, L1_Mus1-4 subfamilies also contribute to new L1 ASP transcripts. The mouse element contains the ASP in ORF1p . While the sequence differs dramatically between human and mouse L1s, the functional conservation of ASP activity indicates selective pressure to preserve this feature. It is interesting to speculate as to whether the L1 ASP activity benefits the host or the L1 element fitness. There is some evidence that the L1 ASP transcripts might produce siRNAs to repress the L1 sense transcript . Contradicting this view is new evidence that ORF0 transcribed from the ASP is correlated with the transposition competent sense transcript . Lastly, there are more instances being identified of transposons being exapted for normal cellular functions by the host . The fact that some ancient mammalian L1 elements are conserved in diverse mammals from human to mouse provides support to the hypothesis that L1 ASP transcripts may be exapted for functional roles.
Perhaps one of the most interesting and easily testable ramifications stemming from identification of such a large number of L1 ASP transcripts is that some are likely to be polymorphic in the human population. Some of the transcripts identified here are likely to be absent in at least some individuals in the human population. By extension, there are likely polymorphic L1 transposition events present in the population but absent from the reference genome that would also likely contribute to L1 ASP transcripts. It would be of interest to determine allele frequency for a handful of chosen L1 ASP transcripts. For instance, is the putative L1-MET chimeric transcript polymorphic, and does it segregate with any observable phenotype? The expanded repertoire of L1 ASP transcripts described herein could exert numerous effects on gene regulation that remain to be investigated.
Computational detection of putative L1 ASP transcripts
We identified candidate L1 ASP transcripts by applying a series of intersections on genomic intervals of three annotations obtained using the UCSC table browser tool in Browser Extensible Data (BED) format. First, we downloaded the locations of L1 elements by extracting the coordinates of all LINE-1 family repeats from the Repeatmasker (hg19) annotation . Second, we downloaded the annotated exon coordinates of RefSeq genes (hg19). Third, we downloaded the coordinates of UCSC human spliced ESTs, a table containing the alignment coordinates of ESTs displaying evidence of splicing, as a block feature (start to end) and as coordinates of spliced EST exons (downloaded July 2015). Importantly, the strand field in the UCSC human spliced ESTs table reflects the alignment direction (e.g., plus is 5’ to 3’) of the EST cDNA to the genome by Blat and does not reflect the transcriptional direction. To convert the strand in the UCSC human spliced ESTs database to reflect transcriptional direction, we used the linked UCSC estOrientInfo table. If the value of the intronOrientation field in the associated table is positive, the transcriptional direction matches the alignment direction; however, if the intronOrientation field is negative, the strandedness is opposite. Sometimes, a call cannot be determined for EST transcriptional direction because the intronOrientation field is zero; these ambiguous cases were removed manually from subsequent analyses. For mouse we used a similar set of annotations: Repeatmasker (mm10), RefSeq genes (mm10), and the UCSC mouse spliced ESTs (mm10, downloaded July 2015). For the mouse annotation, a larger fraction of cases were ambiguous and removed because the intronOrientation field was zero and transcriptional direction could not be determined.
Next, we defined criteria to identify putative L1 ASP transcripts. The 5’ TSS of the EST was required to originate from within an annotated L1 element, and the transcriptional direction strand of the EST was required to be antisense to the L1. Second, an exon coordinate of the EST was required to overlap with an annotated exon coordinate of a gene. Third, we excluded L1 ASP transcripts where an independent EST supported L1 exonization (e.g., inclusion of an intronic antisense L1 in a gene transcript of the structure exon to L1 to exon). We selected ESTs that met this criteria and represented L1 ASP transcripts using sequential intersections performed with bedtools . To verify the accuracy of the selection method, we manually inspected putative L1 ASP transcripts using the UCSC genome browser. Manual inspection identified a subset of ESTs that contained a higher degree of uncertainty due to the presence of multiple annotated alignments. We subsequently added whether the UCSC human spliced ESTs contained more than one alignment of the EST to Additional file 1: Table S1, in order to distinguish this category.
Computational validation with PacBio RNA-seq
The long-read PacBio RNA-seq data from human embryonic stem cells (H1-ESCs) was annotated as transcript isoform predictions by Au, K. et al. . We downloaded H1-ESCs transcript isoform predictions in GTF format and converted the data to BED format. We applied the same intersections as described above for the EST screen using BEDtools. We characterized the filtered set of PacBio transcript isoforms for those also identified by the EST screen that supported L1 ASP transcripts.
Annotation of ESTs that support L1 ASP transcripts
Additional information on tissue of origin, cell line of origin, and normal/cancer status were obtained by extracting and parsing associated information on ESTs downloaded from the Batch Entrez portal (http://www.ncbi.nlm.nih.gov/sites/batchentrez), which was then added to Additional file 1: Table S1. The ESTs supporting putative L1 ASP transcripts may be represented multiple times in Additional file 1: Table S1 because they overlap with more than one gene; however, further data analysis was only conducted on the unique set of ESTs identified. We manually inspected 554 of the identified ESTs using the UCSC genome browser and verified they all matched the above criteria for an L1 ASP transcript in at least one location. The EST sequences of all identified putative L1 ASP transcripts were also downloaded in FASTA format using the Batch Entrez portal. The sequences were examined for the presence of open reading frames (ORFs) of at least 100 bp using the TransDecoder module (http://transdecoder.github.io/) of Trinotate Transcript Annotation, which is a part of the Trinity package . We also required that predicted ORFs start with a start codon encoding methionine to be considered a putative peptide. The genes associated with putative L1 ASP transcripts were used for gene ontology analysis using the PANTHER online statistical overrepresentation test  and the PANTHER GO slim biological process (with redundant GO categories removed). The raw p-values for the full results reported by PANTHER were corrected using Benjamini Hochberg false discovery rate correction using the R statistical language .
Characterization of the L1 ASP
We examined L1 ASP transcripts in four categories: human-specific L1HS, primate-specific L1PA2-8, ancient primate L1s, and ancient mammalian L1s. The EST sequences were aligned to full-length consensus sequences of L1s in RepBase and those reported by Khan et al.  using the LAST aligner (http://last.cbrc.jp/) . For each EST the position of the TSS was computed as a percentage alignment position with respect to the full-length consensus. YY1 transcription factor-binding sites in the 5’ UTR were identified by extracting the 5’ UTR sequence from the L1HS consensus FASTA. The online tool JASPAR Scan was used to identify YY1 binding sites corresponding to two PWM for YY1 (MA0095.1 and MA0095.2) against the first one kb 5’ UTR of the L1HS consensus using a relative score threshold of 90 % . To create YY1 transcription factor profiles, cell lines in the ENCODE project, for which YY1 ChIP-seq was performed, were selected (for the full list of publically available data, see Additional file 10: Table S9). The YY1 ChIP-seq and input control reads were aligned to the L1HS consensus sequence using bowtie1, which is ideal for short reads <100 bp . The log2FC enrichment of YY1 was calculated per-base-pair of L1HS consensus using the read coverage per million mapping reads (RPM) of YY1 ChIP and input control, for which the normalizing factor, total number of mapping reads in the library, was determined by separate alignment to the human genome (build hg19). The raw log2FC YY1 enrichment per-base-pair signal of L1HS consensus was smoothed by applying LOESS smoothing with parameter α = 0.1.
To build TSS profiles of L1 ASP transcripts, we obtained the TSS coordinate for L1PA2-8 and L1HS antisense EST transcripts for plus-strand or minus-strand ESTs. We downloaded alignment files from the ENCODE data repositories in BAM format (genome build hg19, see Additional file 9: Table S8 for a complete list) for the K562, H1-ESC, and Hela cell-lines and ChIP-seq data for the H3K4me2 and H3K4me3 histone marks and associated input controls. For K562 cells, we also downloaded alignments for YY1 ChIP-seq data and input controls. The average ChIP enrichment of EST TSS for plus- and minus-strand L1 ASP transcripts was calculated using Python package Metaseq using a genomic window of +/−1000 bp TSS and a 100 bp bin size to calculate depth . The results for the plus and minus strands were merged, where −1000 bp represented upstream of the TSS and +1000 bp downstream of the TSS. The output of Metaseq was the input-subtracted ChIP enrichment normalized as RPMs. The GRO-seq data were analyzed in a similar manner (see Additional file 10: Table S9 for a complete list); however, alignments to the human genome (build hg19) were conducted using BWA-MEM . We used Samtools to separate the positive-strand and minus-strand GRO-seq reads and analyzed these reads against plus strand or minus strand EST TSSs using Metaseq . The results for the plus- and minus-strand GRO-seq profiles were merged to obtain normalized RPM enrichment.
For each specimen, we procured a single tissue fragment, measuring ~ 1.0 × 0.5 × 0.5 cm, of selected grossly unremarkable organs. All tissues were stored at −80°C.
Validation of selected chimeric L1 ASP transcripts
We selected L1-UVRAG and L1-KIAA1324L chimeric L1 ASP transcripts for validation. Using semi-quantitative RT-PCR, we amplified a region across the putative chimeric L1 ASP transcript that spans the L1 promoter and nearby exon. Half of the reaction (10 μl) was resolved on a 2 % agarose gel to confirm the presence of a singular PCR product. The remainder of the reaction was cloned using a TOPO TA Cloning Kit according to the manufacturer’s specifications (Thermo Fisher Scientific, Inc.; Wilmington, DE) and sequenced via Sanger sequencing.
RNA isolation and cDNA synthesis
We originally yielded very small RNA amounts (<20 ng RNA total) from fibrous organs, such as skin and muscle. Therefore, we developed an in-house RNA isolation method by modifying the protocol of an RNAeasy Plus Mini Kit (Qiagen Sciences, Inc.; Germantown, MD). First, a razor and forceps were used to finely mince one small tissue fragment, weighing up to 50 mg, on dry ice. The minced tissue fragments were suspended in 900 ml of PBS, to which 100 μl of collagenase/dispase solution (stock at 10 mg/ml) was added. The solution was incubated at 37 °C for 1 h. Then, the lysate was centrifuged for three (3) minutes at full speed using a bench-top centrifuge. The pellet was collected and mixed with 10 μl of β-mercaptoethanol and 1 ml of TRIzol reagent. The specimen on ice was homogenized using a hand-held homogenizer for 5 takes, each entailing continuous homogenization for 1 min followed by rest for 30 s. The specimen was incubated for 5 min at room temperature. Proteinase-K was then added to a final concentration of 250 μg/ml (which came out to 12 μl of 20 mg/ml stock) and incubated at 56 °C for 10 min . The lysate was pipetted directly into a QIAshredder spin column and centrifuged for 2 min at full speed. The supernatant was collected into a gDNA Eliminator spin column; the specimen was centrifuged for 30 s at 8,000 x g, and the flow-through was saved. The RNA precipitation was initiated by adding 0.25 ml of chloroform to the supernatant, and the specimen was shaken vigorously for 30 s. The specimen was incubated at room temperature for 10 min, followed by centrifugation for 5 min at 12,000 x g at 4 °C. The aqueous layer was separated, without touching the middle layer (interface), and then mixed with 2 μl of Pellet Paint NF Co-Precipitant reagent. 1 volume (usually ~200 μl) was mixed with ice-cold 70 % isopropanol into the specimen, and precipitate was allowed to form for 10 min at room temperature. The RNA was pelleted by centrifugation at maximum speed for 15 min, after which the supernatant was removed carefully. The pellet was in 500 μl of 70 % ethanol, to remove as much of the overlying supernatant as possible, and then air dried at room temperature for approximately 5 min. Finally, the pellet was suspended in 30–50 μl of RNASE-free deionized water and the RNA was quantified via a NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, Inc.; Wilmington, DE). cDNA were synthesized using poly-T primers according to the manufacturer’s specifications (Roche Diagnostics; Basel, Switzerland).
Quantitative real-time PCR
We estimated the relative abundance of targeted RNAs by resulting traditional Cq (threshold cycle; quantitative cycle) interval values in a StepOnePlus™ Real-Time PCR System (Thermo Fisher Scientific, Inc.; Wilmington, DE). Each experiment was performed in technical triplicates. Actual PCR products were quantified with a FastStart Universal SYBR Green probe (Roche Diagnostics; Basel, Switzerland). The relative abundance of experimental RNA—specifically, the arithmetic means of the Cq values—was normalized to that of an internal control RNA (GAPDH) to relative PCR efficiencies and pictorially represented as Delta Cq (ΔCq) means ± standard deviation. The PCR primers used in this report are in Additional file 11: Table S10.
We identify 988 putative L1 antisense chimeric transcripts, the vast majority of which are novel. We independently verify some L1 antisense chimeric transcripts using both bioinformatics analysis and empirical evidence. Interestingly, some L1 antisense chimeric transcripts are associated with evolutionarily ancient L1 subfamilies, suggesting exaptation of these evolutionarily-older L1 sequences. We conclude that L1 antisense promoters contribute to the transcription of up to 4% of all human genes and may potentially have wide-ranging effects in health and disease.
AS, Antisense; ASP, Antisense promoter; CAGE-seq, Capped analysis of gene expression sequencing; ChIP, Chromatin immunoprecipitation; EST, Expressed sequence tag; GO, Gene ontology; GRO-seq, Global run-on sequencing; L1, Long Interspersed Element-1; LINE-1, Long Interspersed Element-1; lncRNA, Long noncoding RNA; ORF, Open reading frame; TSS, Transcription start site; UTR, Untranslated region
We would like to acknowledge John M. Sedivy for helpful discussions of our manuscript. We would also like to thank Andrew Leith for assistance in preparing the manuscript.
This work was supported in part by the following grants from the National Institute of Health: K25 AG028753, K25 AG028753-03S1, and R56 AG050582-01 to NN. The National Institute of Aging grants F31AG050365 and NIH Institutional Research Training Grant T32 GM007601 to S.W.C.
Availability of data and material
A description of publically available data used in this study is provided in Table S9. ESTs identified by this study can be downloaded using the NCBI batch search (http://www.ncbi.nlm.nih.gov/sites/batchentrez) of the EST IDs in Table S1. The ENCODE data is available through the gene expression omnibus (GEO) repository accession GSE2961 and GSE32465. The GRO-seq data is available through the GEO repository accession numbers GSE60454, GSE62046, and GSE41324. The PacBio RNA-seq isoform detection is available through the GEO repository accession number GSE51861. All subsequent analysis is provided as additional files.
NN and NR conceived the study. NN, NR, and SWC wrote the paper. SWC performed the bioinformatic analysis. KHB contributed to the design of additional experiments. NT, GM, and TCC conducted experimental validation experiments. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Collection of human tissues in this study was approved by the Autopsy Service at the Johns Hopkins Hospital in compliance with institutional guidelines.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Burns KH, Boeke JD. Human transposon tectonics. Cell. 2012;149:740–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang CR, Burns KH, Boeke JD. Active transposition in genomes. Annu Rev Genet. 2012;46:651–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Wheelan SJ, Aizawa Y, Han JS, Boeke JD. Gene-breaking: a new paradigm for human retrotransposon-mediated gene evolution. Genome Res. 2005;15:1073–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Khan H, Smit A, Boissinot S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006;16:78–87.View ArticlePubMedPubMed CentralGoogle Scholar
- Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH Jr. Hot L1s account for the bulk of retrotransposition in the human population. Proceedings of the National Academy of Sciences of the United States of America. 2003;100:5280–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Salem AH, Myers JS, Otieno AC, Watkins WS, Jorde LB, Batzer MA. LINE-1 preTa elements in the human genome. J Mol Biol. 2003;326:1127–46.View ArticlePubMedGoogle Scholar
- Speek M. Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol Cell Biol. 2001;21:1973–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Matlik K, Redik K, Speek M. L1 antisense promoter drives tissue-specific transcription of human genes. J Biomed Biotechnol. 2006;2006:71753.View ArticlePubMedPubMed CentralGoogle Scholar
- Nigumann P, Redik K, Matlik K, Speek M. Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics. 2002;79:628–34.View ArticlePubMedGoogle Scholar
- Roman-Gomez J, Jimenez-Velasco A, Agirre X, Cervantes F, Sanchez J, Garate L, Barrios M, Castillejo JA, Navarro G, Colomer D, et al. Promoter hypomethylation of the LINE-1 retrotransposable elements activates sense/antisense transcription and marks the progression of chronic myeloid leukemia. Oncogene. 2005;24:7213–23.View ArticlePubMedGoogle Scholar
- Weber B, Kimhi S, Howard G, Eden A, Lyko F. Demethylation of a LINE-1 antisense promoter in the cMet locus impairs Met signalling through induction of illegitimate transcription. Oncogene. 2010;29:5775–84.View ArticlePubMedGoogle Scholar
- Cruickshanks HA, Vafadar-Isfahani N, Dunican DS, Lee A, Sproul D, Lund JN, Meehan RR, Tufarelli C. Expression of a large LINE-1-driven antisense RNA is linked to epigenetic silencing of the metastasis suppressor gene TFPI-2 in cancer. Nucleic Acids Res. 2013;41:6857–69.View ArticlePubMedPubMed CentralGoogle Scholar
- Cruickshanks HA, Tufarelli C. Isolation of cancer-specific chimeric transcripts induced by hypomethylation of the LINE-1 antisense promoter. Genomics. 2009;94:397–406.View ArticlePubMedGoogle Scholar
- Macia A, Munoz-Lopez M, Cortes JL, Hastings RK, Morell S, Lucena-Aguilar G, Marchal JA, Badge RM, Garcia-Perez JL. Epigenetic control of retrotransposon expression in human embryonic stem cells. Mol Cell Biol. 2011;31:300–16.View ArticlePubMedGoogle Scholar
- Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, Schroder K, Cloonan N, Steptoe AL, Lassmann T, et al. The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. 2009;41:563–71.View ArticlePubMedGoogle Scholar
- Denli AM, Narvaiza I, Kerman BE, Pena M, Benner C, Marchetto MC, Diedrich JK, Aslanian A, Ma J, Moresco JJ, et al. Primate-Specific ORF0 Contributes to Retrotransposon-Mediated Diversity. Cell. 2015;163:583–93.View ArticlePubMedGoogle Scholar
- Kaer K, Branovets J, Hallikma A, Nigumann P, Speek M. Intronic L1 retrotransposons and nested genes cause transcriptional interference by inducing intron retention, exonization and cryptic polyadenylation. PLoS One. 2011;6:e26099.View ArticlePubMedPubMed CentralGoogle Scholar
- Zemojtel T, Penzkofer T, Schultz J, Dandekar T, Badge R, Vingron M. Exonization of active mouse L1s: a driver of transcriptome evolution? BMC Genomics. 2007;8:392.View ArticlePubMedPubMed CentralGoogle Scholar
- Rebollo R, Farivar S, Mager DL. C-GATE - catalogue of genes affected by transposable elements. Mob DNA. 2012;3:9.View ArticlePubMedPubMed CentralGoogle Scholar
- Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, Wong WH. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A. 2013;110:E4821–4830.View ArticlePubMedPubMed CentralGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee J, Cordaux R, Han K, Wang J, Hedges DJ, Liang P, Batzer MA. Different evolutionary fates of recently integrated human and chimpanzee LINE-1 retrotransposons. Gene. 2007;390:18–27.View ArticlePubMedGoogle Scholar
- Athanikar JN, Badge RM, Moran JV. A YY1-binding site is required for accurate human LINE-1 transcription initiation. Nucleic Acids Res. 2004;32:3846–55.View ArticlePubMedPubMed CentralGoogle Scholar
- Becker KG, Swergold G, Ozato K, Thayer RE. Binding of the ubiquitous nuclear transcription factor YY1 to a cis regulatory sequence in the human LINE-1 transposable element. Human Molecul Genet. 1993;2:1697–702.View ArticleGoogle Scholar
- Minakami R, Kurose K, Etoh K, Furuhata Y, Hattori M, Sakaki Y. Identification of an internal cis-element essential for the human L1 transcription and a nuclear factor(s) binding to the element. Nucleic Acids Res. 1992;20:3139–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Kurose K, Hata K, Hattori M, Sakaki Y. RNA polymerase III dependence of the human L1 promoter and possible participation of the RNA polymerase II factor YY1 in the RNA polymerase III transcription system. Nucleic Acids Res. 1995;23:3704–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Gaston K, Fried M. YY1 is involved in the regulation of the bi-directional promoter of the Surf-1 and Surf-2 genes. FEBS Lett. 1994;347:289–94.View ArticlePubMedGoogle Scholar
- Cole EG, Gaston K. A functional YY1 binding site is necessary and sufficient to activate Surf-1 promoter activity in response to serum growth factors. Nucleic Acids Res. 1997;25:3705–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen C-y, Chou A, Ienasescu H, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Research 2013;42(Database issue):D142-7.Google Scholar
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489:57–74.View ArticleGoogle Scholar
- Ng HH, Robert F, Young RA, Struhl K. Targeted recruitment of Set1 histone methylase by elongating Pol II provides a localized mark and memory of recent transcriptional activity. Mol Cell. 2003;11:709–19.View ArticlePubMedGoogle Scholar
- Santos-Rosa H, Schneider R, Bannister AJ, Sherriff J, Bernstein BE, Emre NC, Schreiber SL, Mellor J, Kouzarides T. Active genes are tri-methylated at K4 of histone H3. Nature. 2002;419:407–11.View ArticlePubMedGoogle Scholar
- Orford K, Kharchenko P, Lai W, Dao MC, Worhunsky DJ, Ferro A, Janzen V, Park PJ, Scadden DT. Differential H3K4 methylation identifies developmentally poised hematopoietic genes. Dev Cell. 2008;14:798–809.View ArticlePubMedGoogle Scholar
- Koch CM, Andrews RM, Flicek P, Dillon SC, Karaoz U, Clelland GK, Wilcox S, Beare DM, Fowler JC, Couttet P, et al. The landscape of histone modifications across 1 % of the human genome in five human cell lines. Genome Res. 2007;17:691–707.View ArticlePubMedPubMed CentralGoogle Scholar
- Danko CG, Hyland SL, Core LJ, Martins AL, Waters CT, Lee HW, Cheung VG, Kraus WL, Lis JT, Siepel A. Identification of active transcriptional regulatory elements from GRO-seq data. Nat Methods. 2015;12:433–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Andersson R, Refsing Andersen P, Valen E, Core LJ, Bornholdt J, Boyd M, Heick Jensen T, Sandelin A. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nat Commun. 2014;5:5336.View ArticlePubMedGoogle Scholar
- Danko CG, Hah N, Luo X, Martins AL, Core L, Lis JT, Siepel A, Kraus WL. Signaling pathways differentially affect RNA polymerase II initiation, pausing, and elongation rate in cells. Mol Cell. 2013;50:212–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, Wooster R. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355–8.PubMedPubMed CentralGoogle Scholar
- Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2011;39:D945–950.View ArticlePubMedGoogle Scholar
- Li J, Kannan M, Trivett AL, Liao H, Wu X, Akagi K, Symer DE. An antisense promoter in mouse L1 retrotransposon open reading frame-1 initiates expression of diverse fusion transcripts and limits retrotransposition. Nucleic Acids Res. 2014;42:4546–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Criscione SW, Zhang Y, Thompson W, Sedivy JM, Neretti N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics. 2014;15:583.View ArticlePubMedPubMed CentralGoogle Scholar
- Rodić N, Sharma R, Sharma R, Zampella J, Dai L, Taylor MS, Hruban RH, Iacobuzio-Donahue CA, Maitra A, Torbenson MS, et al. Long interspersed element-1 protein expression is a hallmark of many human cancers. Am J Pathol. 2014;184:1280–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Ewing AD, Gacita A, Wood LD, Ma F, Xing D, Kim MS, Manda SS, Abril G, Pereira G, Makohon-Moore A, et al.: Widespread somatic L1 retrotransposition occurs early during gastrointestinal cancer evolution. Genome Res 2015;25(10):1536-45.Google Scholar
- Helman E, Lawrence MS, Stewart C, Sougnez C, Getz G, Meyerson M. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 2014;24:1053–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette 3rd LJ, Lohr JG, Harris CC, Ding L, Wilson RK, et al. Landscape of somatic retrotransposition in human cancers. Science. 2012;337:967–71.View ArticlePubMedPubMed CentralGoogle Scholar
- Rodic N, Steranka JP, Makohon-Moore A, Moyer A, Shen P, Sharma R, Kohutek ZA, Huang CR, Ahn D, Mita P, et al. Retrotransposon insertions in the clonal evolution of pancreatic ductal adenocarcinoma. Nat Med 2015;21(9):1060-4.Google Scholar
- Shukla R, Upton KR, Munoz-Lopez M, Gerhardt DJ, Fisher ME, Nguyen T, Brennan PM, Baillie JK, Collino A, Ghisletti S, et al. Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. Cell. 2013;153:101–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Solyom S, Ewing AD, Rahrmann EP, Doucet T, Nelson HH, Burns MB, Harris RS, Sigmon DF, Casella A, Erlanger B, et al. Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res. 2012;22:2328–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Tubio JM, Li Y, Ju YS, Martincorena I, Cooke SL, Tojo M, Gundem G, Pipinikas CP, Zamora J, Raine K, et al. Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science. 2014;345:1251343.View ArticlePubMedPubMed CentralGoogle Scholar
- Frith MC, Bailey TL, Kasukawa T, Mignone F, Kummerfeld SK, Madera M, Sunkara S, Furuno M, Bult CJ, Quackenbush J, et al. Discrimination of non-protein-coding transcripts from protein-coding mRNA. RNA Biol. 2006;3:40–8.View ArticlePubMedGoogle Scholar
- Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci U S A. 2007;104:19428–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Dinger ME, Pang KC, Mercer TR, Mattick JS. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol. 2008;4:e1000176.View ArticlePubMedPubMed CentralGoogle Scholar
- Frith MC, Forrest AR, Nourbakhsh E, Pang KC, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bailey TL, Grimmond SM. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2006;2:e52.View ArticlePubMedPubMed CentralGoogle Scholar
- Pelechano V, Steinmetz LM. Gene regulation by antisense transcription. Nat Rev Genet. 2013;14:880–93.View ArticlePubMedGoogle Scholar
- Yang N, Kazazian Jr HH. L1 retrotransposition is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol. 2006;13:763–71.View ArticlePubMedGoogle Scholar
- Goodier JL, Kazazian Jr HH. Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell. 2008;135:23–35.View ArticlePubMedGoogle Scholar
- Smit A, Hubley, R & Green, P.: RepeatMasker Open-4.0. http://www.repeatmasker.org
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Mi H, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the PANTHER classification system. Nat Protocols. 2013;8:1551–66.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 1995;57(1):289–300.Google Scholar
- Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Dale RK, Matzat LH, Lei EP. metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA. Nucleic Acids Res. 2014;42:9158–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–95.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Egyhazi S, Bjohle J, Skoog L, Huang F, Borg AL, Frostvik Stolt M, Hagerstrom T, Ringborg U, Bergh J. Proteinase K added to the extraction procedure markedly increases RNA yield from primary breast tumors for use in microarray studies. Clin Chem. 2004;50:975–6.View ArticlePubMedGoogle Scholar