Systematic investigation of insertional and deletional RNA-DNA differences in the human transcriptome
© Chen and Bundschuh; licensee BioMed Central Ltd. 2012
Received: 27 July 2012
Accepted: 7 November 2012
Published: 13 November 2012
The genomic information which is transcribed into the primary RNA can be altered by RNA editing at the transcriptional or post-transcriptional level, which provides an effective way to create transcript diversity in an organism. Altering can occur through substitutional RNA editing or via the insertion or deletion of nucleotides relative to the original template. Taking advantage of recent high throughput sequencing technology combined with bioinformatics tools, several groups have recently studied the genome-wide substitutional RNA editing profiles in human. However, while insertional/deletional (indel) RNA editing is well known in several lower species, only very scarce evidence supports the existence of insertional editing events in higher organisms such as human, and no previous work has specifically focused on indel differences between RNA and their matching DNA in human. Here, we provide the first study to examine the possibility of genome-wide indel RNA-DNA differences in one human individual, NA12878, whose RNA and matching genome have been deeply sequenced.
We apply different computational tools that are capable of identifying indel differences between RNA reads and the matching reference genome and we initially find hundreds of such indel candidates. However, with careful further analysis and filtering, we conclude that all candidates are false-positives created by splice junctions, paralog sequences, diploid alleles, and known genomic indel variations.
Overall, our study suggests that indel RNA editing events are unlikely to exist broadly in the human transcriptome and emphasizes the necessity of a robust computational filter pipeline to obtain high confidence RNA-DNA difference results when analyzing high throughput sequencing data as suggested in the recent genome-wide RNA editing studies.
KeywordsIndel RNA-DNA differences RNA-seq data analysis Computational filtering
RNA is an important biomolecule that is deeply involved in almost all aspects of molecular biology, such as protein production, gene regulation, and viral replication. In order to perform such a variety of functions, the primary RNA transcripts need to be extensively processed. By changing the genomically encoded sequence at the transcriptional or post-transcriptional level, RNA editing provides an effective way to create transcript and protein diversity with limited primary RNA transcripts in an organism[2–4]. Alteration can occur through the insertion or deletion of nucleotides relative to the original template (insertional/deletional or “indel” RNA editing), or via substitutional RNA editing, in which one nucleotide is replaced by or changed to another.
The most common type of known RNA editing in metazoans involves conversion of adenosine to inosine (A-to-I editing), which is mediated by adenosine deaminase acting on RNA (ADAR) enzymes[6–8]. Inosine preferentially base pairs with cytidine, and is therefore functionally equivalent to guanosine. Thus, A-to-I editing in mRNA can alter the genetic information stored in the primary sequence, leading to changes in protein-coding sequences and mRNA stability and splicing. A large number of A-to-I editing events have been identified in the human transcriptome by genome-wide bioinformatics and high throughput sequencing studies[9–14]. While both coding and non-coding sequences undergo A-to-I editing, it has been found that editing occurs mainly in repetitive sequences which are located within 5’ or 3’ untranslated regions (UTRs) or introns[9–14].
Taking advantage of whole-genome and transcriptome deep-sequencing technologies, recent studies have extensively investigated all the potential types of substitutional RNA editing in the human transcriptome using bioinformatics tools that are capable of identifying mismatches between RNA reads and the matching reference genome[15–17]. While the validity of some of the results in is currently under debate[18–21], these studies revealed a large number of substitutional RNA editing candidate sites including many A-to-I editing events. It is now obvious that combining high throughput sequencing and bioinformatics has the ability to identify RNA editing events that occur at a single nucleotide level across the whole transcriptome.
Insertional and deletional editing events have been discovered in various species[2, 4], such as U insertions and deletions in kinetoplastids, G and A insertions in paramyxoviruses, and various types in Myxomycota. Most of these events are found in mRNA sequences, with certain functions like creating new start and stop codons by uridine insertions in kinetoplastids, creating new open reading frames by nucleotide insertions in kinetoplastid and Physarum mitochondria, and frameshifting between alternative ORFs in paramyxoviruses. In higher organisms, no indel editing events have been identified until recently Zougman et al. reported two insertional RNA editing events in human: according to their data in the 5’UTRs of the linker histone H1 mRNA and of the high-mobility group (HMG) mRNA, a single uridine each inserts between an A and a G, creating new translation start sites and producing N-terminally extended proteins. However, to our knowledge no follow-up work has been done concerning these editing sites and no additional indel editing events have been reported in human since.
In this study, we explore the possibility of indel RNA editing events across the transcriptome in human by systematically examining the variations between RNA-seq reads and their matching genome. Specifically, we examine the possibility of genome-wide indel RNA-DNA differences in one human individual, NA12878. We apply different computational tools that use gapped alignments to identify indel differences between RNA reads and the matching genome. While hundreds of such indel candidates are revealed after initial selection, further analysis and filtering indicate that all of them are false positives which result from incorrect alignments including splice junctions, paralog sequences, diploid alleles, and known genomic indel variations. The results from our study suggest that indel RNA editing events are unlikely to exist widely in the human transcriptome and emphasize the importance of thorough filtering in genome-wide studies of RNA editing.
Mapping of RNA-seq reads yields hundreds of candidate indel RNA-DNA differences
The initial step in the detection of RNA-DNA differences is the accurate mapping of RNA-seq reads to their matching reference genome. Failure to correctly assign a read to its original location may lead to spurious alignments that may be misinterpreted as editing events. Since a large number of genomic variations including single nucleotide differences and indels exist among different individuals, it is crucial to compare RNA and DNA sequences from the same background (to verify the importance of using the same background we in fact also performed our analysis using the hg19 reference genome and found as expected a large number of genomic variations reported as false positive results). Based on the reference genome (NCBI build 36) and incorporating genomic variations and structural variations identified by the 1000 Genome pilot project, the Gerstein lab has recently created a version of the diploid genome sequence for the NA12878 individual from the lymphoblastoid cell line GM12878. Matching deeply sequenced RNA-Seq data sets for the same cell line are also available. We use this assembled genome to identify possible insertional and deletional RNA-DNA differences in NA12878, by directly aligning RNA-seq reads against their matching genome. Since the assembled genome is a diploid one which contains maternal and paternal haplotypes with small variations in sequences, we first map RNA-seq reads to the maternal genome and list all the potential candidates and then remove the candidates resulting from maternal-paternal genome variations (see below). The detailed information of RNA-seq data and diploid genome for GM12878 we used in this study are described in the Methods section.
The rapid emergence of high-throughput sequencing techniques has resulted in the development of a variety of short sequence read mappers that are based on different alignment strategies. Since our goal is to identify indels within all RNA-DNA differences events, the basic requirement for the mapping tools is that indels should be allowed when aligning short RNA reads to the reference. By evaluating most of the currently available mapping tools, we find that BFAST (Blat-like Fast Accurate Search Tool) is one of the most suitable softwares for our indel analysis. In contrast to some other algorithms that speed up the mapping process by ignoring errors and indels, BFAST is very sensitive to errors, single-nucleotide polymorphisms (SNPs) and especially indels with a considerably fast mapping speed. Since mapping bias inherent to the mapping algorithm may affect results, we also use another tool, bowtie2, a fast and accurate mapping algorithm in which gapped alignment is allowed and compare the results.
For the initial BFAST mapping (alignment settings are described in detail in the Methods section), out of the 113,902,864 reads, 79,833,200 could be mapped to the assembled maternal haploid genome of GM12878 over their entire length. For bowtie2 (using the default setting suggested in the manual) a total of 40,862,987 reads could be mapped to the assembled maternal genome over their entire length. This mapping ratio is significantly lower than that in BFAST, which is probably due to the higher stringency of bowtie2 for mapping a read to the reference genome.
The mapping output for BFAST and bowtie2 are SAM (Sequence Alignment/Map) files, which were processed by the SAMtools software package, a package that was originally designed to identify genomic variations. We conduct initial indel variant calling taking advantage of the mpileup algorithm implemented in SAMtools, using the default settings used for calling genomic SNPs except that we do not require “heterozygotes” to reach 50% read support since editing could occur at lower frequency. In order to minimize the influence of sequencing and reverse transcription errors, candidates are required to pass quality control thresholds for base calling quality, read mapping quality (reliability of the alignment across the genome), read coverage, variant/reference quality, and indel type and size (see details in the Methods section).
After these initial selections, 685 candidates remain in the BFAST results while 250 candidates remain in the bowtie2 results. Of these, 110 were shared between the two mapping approaches. The fact that bowtie2 has much fewer candidates than BFAST is probably due to the lower read mapping ratio mentioned above. As for the candidates found by bowtie2 but not by BFAST, many are at the edge of the filtering thresholds. Thus, small differences in the way quality measures and variants of aligned reads are reported in the two mappers lead to a candidate being just above the threshold in one method and just below in the other. We notice, that as indicated below all these questionable reads are filtered out by the additional false positive filtering steps and the overlap between remaining candidates based on BFAST alignments and remaining candidates based on bowtie2 alignments is much larger.
Careful filtering reveals that all indel editing candidates are false positives
After the initial selections, the list of indel variations called by SAMtools may still contain a large number of false positives that are unrelated to indel RNA editing. These false positives may be a result of known genomic variations, different alleles from diploid genomes, and misalignment of reads due to, e.g., splice junctions and paralog sequences in the genome. We thus apply a series of stringent filters to remove false positives from our candidate lists.
Sensitivity of computational pipeline
In order to conclude that our results indeed imply that indel editing is rare rather than being a result of the inability of our pipeline to find indel differences that are present, we tested the sensitivity of our computational pipeline, by examining how many of the known genomic indels in the NA12878 diploid genome can be found by our pipeline before furthering filtering. We first align the NA12878 maternal genome sequences against the paternal genome sequences to locate positions of all the short indels. For all of these sites, we ranked them according to the reads coverage on the site. We found that for the top 100 expressed genomic indel sites (which have read coverages down to 5 reads) 44 sites are found when using the maternal genome for indel calling (where we excluded sites with homopolymer runs of greater than 5 bp which have a higher chance to result from sequencing errors rather than true indel differences). If indel calls from alignments to the maternal and the paternal genome are combined, we found nearly 90% of the covered genomic indels. We thus conclude that the lack of indel editing sites found in our study is not due to a lack in sensitivity of the pipeline.
Additional RNA-seq datasets yield consistent results
Summary for indel candidates analysis of additional RNA-seq datasets
Known genomic variations
In this work, we provide the first systematic study of the possibility of genome-wide indel RNA-DNA differences in one human individual, NA12878, whose RNA and matching genome have been deeply sequenced. We applied different computational tools that are capable of identifying indel differences between RNA reads and the matching reference genome. After initial selection using SAMtools, we found hundreds of such indel candidates. However, with careful further analysis and filtering, we found that all of them are false-positive results such as splice junctions, paralog sequences, different alleles from diploid genomes, and known genomic indel variations from the SNP database. We thus conclude that there is no evidence for widespread insertional or deletional RNA editing in the human genome.
However, it should be noticed that the RNA-seq data sets we used are from a particular lymphoblastoid cell line; it is thus in principle still possible that widespread indel RNA editing events could be cell type specific and that we may have missed them by selectively focusing on the lymphoblastoid cell line. Moreover, our stringent requirement for detecting such events (at least 2 RNA-seq reads with high base quality and mapping quality supporting editing) may have missed potential sites which are edited at very low frequency.
It is interesting to relate our findings to the recent discussions on substitutional RNA editing initiated by Li et al.. Several technical comments on that study[19–21] pointed out that the mismatches of RNA-seq reads to the reference genome are almost exclusively at the ends of sequencing reads. The response by Li et al. proposes that one of the reasons resulting in this bias is co-occurrence of substitutional RNA-DNA difference sites with insertion/deletion RNA-DNA differences sites. Our results here indicate that such widespread indel RNA-DNA differences are unlikely to exist. Rather, our finding of false positives resulting from splice junctions that often combine apparent mismatches and indels seems to provide a possible explanation for the coexistence of mismatches and indels as well as their occurrence at the end of the reads. Thus, our observation further questions the proposal of indel RNA-DNA mismatches in to explain the end effect of mismatches.
The absence of indel RNA editing in our study also has to be discussed in the light of the previous study suggesting two potential insertional RNA editing sites in human. This apparent discrepancy led us to specifically revisit the two insertional RNA editing sites identified in. Their work suggested that, a single uridine each inserts between A and G in the 5’UTRs of linker histone H1 and high-mobility group (HMG) mRNA and creates new translation start sites and produces N-terminally extended proteins. Further examination of their study and our analysis allow us to propose several possible reasons for this discrepancy.
First, as mentioned above, the editing events may not occur in the specific cell line we investigated. Moreover, the study in showed that in certain cell types the abundance of the “edited” form of proteins is much lower than the normal form of proteins; thus, it is possible that the coverage of RNA-seq data we used is not enough to detect the editing events which occur at a low frequency based on our filtering criteria. In fact, our alignment and filtering data show that only one RNA-seq read can be reliably aligned to the “AG” position in the 5’ UTR of H1.0 mRNA without insertion for both, BFAST and bowtie2, results. We note, that this is not due to the lack of a polyA tail on the histone H1.0 mRNA when preparing the sequencing libraries, since the synthesis of histone H1.0 is not cell cycle-regulated and its mRNA is polyadenylated[36, 37]. For the other case, HMGN1, around 10 reads can be reliably mapped and none of them contain the insertion site. This indicates that the read coverage at these two sites may be not sufficient to identify the “edited” version of the RNA (according to, for h1.0, 11 of 301 EST sequences support the “edited” version; while for hmgn1, only one EST sequence supports the “edited” version).
This is very similar to the pattern observed in our “splice junction” false positives in which “indels” occur close to the end of the alignment and coexist with mismatches. This observation thus may imply that it may have resulted from rare and so far unknown splicing events. Again, we note that histone 1.0 belongs to replication-independent histone mRNAs[36, 37] and thus could in principle be spliced even though no such splice variant has been documented so far. Moreover, the original study indicated that the “edited” form of H1.0 protein colocalizes with splicing speckles which may suggest a connection to splicing. Since did not directly sequence the DNA and corresponding RNA surrounding the “editing sites”, careful examination revealed that splicing can also explain all the additional experimental observations in their study, i.e., an extended protein form, restriction enzyme digestion, etc. Therefore, it is possible that these only two “insertional RNA editing sites” so far are indeed results of novel splicing events, which would require experimental verification.
In this study, we systematically examined the possibility of genome-wide indel RNA-DNA differences in one human individual, NA12878, by aligning several RNA-seq datasets to the corresponding assembled diploid genome from the same cell line. The initial selection revealed a number of indel candidates; however, following analysis showed that all of them are unrelated to RNA editing. Overall, our study suggests that the previously proposed insertional RNA editing events are unlikely to exist in the human transcriptome and that to obtain high confidence RNA-DNA difference results, it is necessary to build a robust computational filter pipeline when analyzing high throughput sequencing data.
Reference genome and RNA-seq reads (Data sources)
First round RNA-seq data used in this study
SRA accession number
Run used in this study
SRR002055, SRR002063, SRR005091, SRR005096
SRR002052, SRR002054, SRR002060
Second round RNA-seq Data used in this study
SRR306998, SRR306999, SRR307000, SRR307001, SRR307002, SRR307003, SRR307004
Mapping RNA-seq reads to the corresponding reference genome
Index sets in BFAST used in this study
Most of the parameters in the alignment process were set to their default values. A single lookup is ignored if it returns more than K=8 candidate alignment locations (CALs); the maximum number of CALs for a read was M=1280. Local alignments were performed for each CAL using default settings and nucleotide substitutions, insertions and deletions were identified in the gapped alignment. Alignments were prioritized by alignment score and only the highest scoring alignment for each mapping read was output. The mapping output was set to SAM format.
Post-processing of mapping output and variant calling
To identify RNA editing sites, the output RNA-seq alignment files in SAM format were processed by the publicly available, open source SAMTools software package (http://samtools.sourceforge.net/) for variant calling. The version we used in this study was samtools.0.1.17.
Using SAMTools, the output SAM files were first converted to their binary versions (BAM files) and then these BAM files were sorted and indexed for rapid lookup. The sorted BAM files were further processed in the variant calling step: using the “mpileup” function in SAMTools, indexed reference sequences and position sorted bam alignment files generated files with read information at sites where mismatches and indels from the reference sequence were detected. Then, only information for indel differences was kept, while reads that contained only mismatches were discarded. The output file after this step served as the starting dataset for the indel RNA editing analysis.
Initial filtering of indel variants
Base quality filter: remove bases at the indel site with a sequencing quality score below 20.
Mapping quality filter: remove reads with a mapping quality score below 20; discard a read if the indel position is within 2bp of the 5' end or 3' end; discard an indel-containing read if more than 3 mismatches are present.
Coverage depth filter: remove candidates with less than 2 indel-containing nonduplicated reads; remove candidates with less than 5 reads; remove candidates with less than 5% indel-containing reads of the total covering reads.
Variant quality: remove candidates with QUAL Phred-score of variant calling below 0.01.
Indel type and size filter: remove variant sites that display more than one nonreference alleles as well as variant sites that contain any uncertain bases (“N”); only keep candidates with only one nucleotide difference from the genomic DNA (i.e., indel size should be one); remove variant sites that display homopolymer runs of more than 5 identical nucleotides.
Basic local aignment search tool
Blat-like fast accurate search tool
National center for biotechnology information
Sequence read archive.
This material is based upon work supported by the National Science Foundation under Grant DMR-0706002. This work was also supported by The Ohio State University Comprehensive Cancer Center’s (OSUCCC) Pelotonia Fellowship Program (to CC). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect those of the Pelotonia Fellowship Program.
- Alberts B: Molecular biology of the cell. 2008, New York: Garland Science, 5
- Gott JM, Emeson RB: Functions and mechanisms of RNA editing. Annu Rev Genet. 2000, 34: 499-531. 10.1146/annurev.genet.34.1.499.View ArticlePubMed
- Gott JM: RNA editing. 2007, San Diego, Calif: Academic Press/Elsevier
- Knoop V: When you can't trust the DNA: RNA editing changes transcript sequences. Cell Mol Life Sci. 2011, 68 (4): 567-586. 10.1007/s00018-010-0538-9.View ArticlePubMed
- Sommer B, Kohler M, Sprengel R, Seeburg PH: RNA editing in brain controls a determinant of ion flow in glutamate-gated channels. Cell. 1991, 67 (1): 11-19. 10.1016/0092-8674(91)90568-J.View ArticlePubMed
- Bass BL: RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem. 2002, 71: 817-846. 10.1146/annurev.biochem.71.110601.135501.PubMed CentralView ArticlePubMed
- Bass BL, Weintraub H: An unwinding activity that covalently modifies its double-stranded-Rna substrate. Cell. 1988, 55 (6): 1089-1098. 10.1016/0092-8674(88)90253-X.View ArticlePubMed
- Nishikura K: Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem. 2010, 79: 321-349. 10.1146/annurev-biochem-060208-105251.PubMed CentralView ArticlePubMed
- Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, et al: Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol. 2004, 22 (8): 1001-1005. 10.1038/nbt996.View ArticlePubMed
- Athanasiadis A, Rich A, Maas S: Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2004, 2 (12): e391-10.1371/journal.pbio.0020391.PubMed CentralView ArticlePubMed
- Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A: Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res. 2004, 14 (9): 1719-1725. 10.1101/gr.2855504.PubMed CentralView ArticlePubMed
- Blow M, Futreal PA, Wooster R, Stratton MR: A survey of RNA editing in human brain. Genome Res. 2004, 14 (12): 2379-2387. 10.1101/gr.2951204.PubMed CentralView ArticlePubMed
- Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM: Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science. 2009, 324 (5931): 1210-1213. 10.1126/science.1170995.View ArticlePubMed
- Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X: Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 2012, 22 (1): 142-150. 10.1101/gr.124107.111.PubMed CentralView ArticlePubMed
- Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG: Widespread RNA and DNA Sequence Differences in the Human Transcriptome. Science. 2011, 333 (6038): 53-58. 10.1126/science.1207018.PubMed CentralView ArticlePubMed
- Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, Hu X, Tan X, et al: Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol. 2012, 30 (3): 253-260. 10.1038/nbt.2122.View ArticlePubMed
- Ramaswami G, Lin W, Piskol R, Tan MH, Davis C, Li JB: Accurate identification of human Alu and non-Alu RNA editing sites. Nat Methods. 2012, 9 (6): 579-581. 10.1038/nmeth.1982.PubMed CentralView ArticlePubMed
- Schrider DR, Gout JF, Hahn MW: Very Few RNA and DNA Sequence Differences in the Human Transcriptome. PLoS One. 2011, 6: 10-View Article
- Lin W, Piskol R, Tan MH, Li JB: Comment on "Widespread RNA and DNA sequence differences in the human transcriptome". Science. 2012, 335 (6074): 1302-author reply 1302View ArticlePubMed
- Pickrell JK, Gilad Y, Pritchard JK: Comment on "Widespread RNA and DNA sequence differences in the human transcriptome". Science. 2012, 335 (6074): 1302-author reply 1302View ArticlePubMed
- Kleinman CL, Majewski J: Comment on "Widespread RNA and DNA sequence differences in the human transcriptome". Science. 2012, 335 (6074): 1302-author reply 1302View ArticlePubMed
- Benne R, Van den Burg J, Brakenhoff JP, Sloof P, Van Boom JH, Tromp MC: Major transcript of the frameshifted coxII gene from trypanosome mitochondria contains four nucleotides that are not encoded in the DNA. Cell. 1986, 46 (6): 819-826. 10.1016/0092-8674(86)90063-2.View ArticlePubMed
- Thomas SM, Lamb RA, Paterson RG: Two mRNAs that differ by two nontemplated nucleotides encode the amino coterminal proteins P and V of the paramyxovirus SV5. Cell. 1988, 54 (6): 891-902. 10.1016/S0092-8674(88)91285-8.View ArticlePubMed
- Mahendran R, Spottswood MR, Miller DL: RNA editing by cytidine insertion in mitochondria of Physarum polycephalum. Nature. 1991, 349 (6308): 434-438. 10.1038/349434a0.View ArticlePubMed
- Stuart K, Allen TE, Heidmann S, Seiwert SD: RNA editing in kinetoplastid protozoa. Microbiol Mol Biol Rev. 1997, 61 (1): 105-120.PubMed CentralPubMed
- Benne R: RNA editing in trypanosomes. Eur J Biochem. 1994, 221 (1): 9-23. 10.1111/j.1432-1033.1994.tb18710.x.View ArticlePubMed
- Miller D, Mahendran R, Spottswood M, Costandy H, Wang S, Ling ML, Yang N: Insertional editing in mitochondria of Physarum. Semin Cell Biol. 1993, 4 (4): 261-266. 10.1006/scel.1993.1031.View ArticlePubMed
- Zougman A, Ziolkowski P, Mann M, Wisniewski JR: Evidence for insertional RNA editing in humans. Curr Biol. 2008, 18 (22): 1760-1765. 10.1016/j.cub.2008.09.059.View ArticlePubMed
- A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. 10.1038/nature09534.
- Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N, et al: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011, 7: 522-PubMed CentralView ArticlePubMed
- Homer N, Merriman B, Nelson SF: BFAST: an alignment tool for large scale genome resequencing. PLoS One. 2009, 4 (11): e7767-10.1371/journal.pone.0007767.PubMed CentralView ArticlePubMed
- Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012, 9 (4): 357-359. 10.1038/nmeth.1923.PubMed CentralView ArticlePubMed
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMed
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMed
- Li MY, Wang IX, Cheung VG: Response to Comments on "Widespread RNA and DNA Sequence Differences in the Human Transcriptome". Science. 2012, 335: 6074-
- Marzluff WF, Wagner EJ, Duronio RJ: Metabolism and regulation of canonical histone mRNAs: life without a poly(A) tail. Nat Rev Genet. 2008, 9 (11): 843-854. 10.1038/nrg2438.PubMed CentralView ArticlePubMed
- Marzluff WF: Metazoan replication-dependent histone mRNAs: a distinct set of RNA polymerase II transcripts. Curr Opin Cell Biol. 2005, 17 (3): 274-280. 10.1016/j.ceb.2005.04.010.View ArticlePubMed
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447 (7146): 799-816. 10.1038/nature05874.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.