Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
© Stevenson et al.; licensee BioMed Central Ltd. 2013
Received: 14 February 2013
Accepted: 5 August 2013
Published: 7 August 2013
RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE.
We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles.
After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.
KeywordsNext-generation sequencing Mapping bias Drosophila melanogaster Drosophila simulans DGRP Allelic imbalance Genomics Gene expression Illumina
During the last five years, massively parallel sequencing of cDNA libraries synthesized from RNA samples (known as “RNA-seq”) has largely replaced the use of microarrays for comparative studies of gene expression (e.g. [1–3]). Advantages of RNA-seq over microarrays include a greater dynamic range and the ability to survey expression in new strains and species without the set-up costs of microarrays and without complications from hybridization differences among genotypes [4, 5]. In addition, because RNA-seq provides full sequence information for the transcriptome, it is better suited for discovering novel transcripts and splice isoforms and for quantifying allelic abundance in heterozygous and mixed genotype samples than microarrays. Measures of allele-specific expression (ASE) are particularly important for studying the regulation of gene expression because they can be used to distinguish cis- and trans-regulatory changes [6, 7] and to detect genomic imprinting [8, 9].
To quantify transcript abundance using RNA-seq, each short sequence read (hereafter simply called a “read”) is compared to an annotated reference genome. Assignment of a read to a specific gene is made by finding the region of the genome with the highest sequence similarity, and the number of reads aligning to a gene is used as a proxy for its relative expression level . Mapping reads to specific genes is relatively straightforward with the bioinformatics tools available today [10–13], but using these tools to distinguish between reads derived from alternative alleles of the same gene remains challenging . This challenge was most clearly demonstrated by Degner et al. , who simulated reads from a heterozygous human genotype and assigned them to specific alleles after mapping to a reference human genome. Reads perfectly matching the reference genome were assigned to the reference allele, whereas reads containing mismatches to the reference genome were assigned to the alternative allele. Despite simulating an equal number of reads from each allele, a bias was observed causing reads to be assigned more often to the reference allele than the alternative allele. Controlling for sites known to be polymorphic in humans prior to aligning the simulated reads produced symmetrical measures of relative ASE, showing that the differentiating sites themselves caused this bias.
Recently, two alternative strategies for aligning reads have been shown to eliminate the systematic bias in measures of relative ASE favoring the reference allele. In the first, RNA-seq reads are aligned separately to maternal and paternal genomes. These allele-specific genomes can be generated either by sequencing inbred lines with the maternal and paternal genotypes [8, 15–17] or by inferring the maternal and paternal haplotypes using phased genotype information such as that available for humans from the 1000 Genomes Project Consortium [18, 19]. However, researchers interested in measuring relative ASE in organisms for which parent-specific genomes cannot be readily obtained will struggle to use this approach. The second strategy is to consider all possible phasings of variants that can occur in the same sequence read and either supplement the reference genome with these haplotypes  or use this information during alignment with a polymorphism-aware aligner, such as GSNAP [21, 22]. This is a viable strategy for both model and non-model species, but will likely be most effective for intraspecific studies of species like humans with relatively low levels of polymorphism because the number of possible haplotypes increases exponentially with the number of polymorphic sites.
To better understand the source(s) of biased measures of relative ASE, we identified properties of sites showing inaccurate measures of relative ASE using simulated Drosophila sequencing data with known values of relative allelic abundance. Simulated datasets contained either ~10-fold or ~100-fold more differentiating sites than the human genotypes used to validate other methods for measuring relative ASE [14, 18, 20]. We also examined the impact of these factors on measures of relative ASE derived from real sequencing data. Reads from simulated and real sequencing data were aligned to a single reference genome, varying the number of mismatches allowed, as well as aligned to separate maternal and paternal genomes with no mismatches allowed. We found that limiting analysis of relative ASE to regions of the genome with no more differentiating sites than the number of mismatches allowed eliminated the systematic bias toward the reference allele and produced measures of ASE similar to those inferred from aligning reads separately to the maternal and paternal genomes. Excluding differentiating sites contained within reads that cannot be aligned uniquely or that overlap an insertion or deletion (indel) further improved measures of relative allelic abundance.
Results and discussion
The systematic bias in measures of ASE correlates with the density of differentiating sites
As described above, Degner et al.  found that allele-specific reads mapped preferentially to the reference allele when using a single reference genome to quantify ASE. The alignment parameters they used allowed two or fewer bases within each read to differ from the reference genome. Reads perfectly matching the reference genome were assigned to the reference allele, while reads with at least one difference from the reference genome were assigned to an alternative allele. We hypothesized that the inability to map reads with more differences from the reference genome than mismatches allowed underestimated the abundance of the alternative allele and caused measures of ASE to be biased toward the reference allele.
To decrease the impact of neighboring differentiating sites on allelic assignment, we allowed two or three mismatches when aligning our simulated reads to the reference genome. We found that increasing the number of mismatches improved measures of allelic abundance: 80.2% and 91.9% of differentiating sites were inferred to be equally abundant when two and three mismatches, respectively, were allowed. A bias toward the reference allele was still observed, but only for sites where the number of neighboring differentiating sites was greater than or equal to the number of mismatches allowed during the alignment step (Figure 2B,C). Increasing the number of mismatches allowed reduced the bias toward the reference allele, but increased the percentage of reads that failed to map uniquely: allowing one, two, and three mismatches, 2.2%, 2.5%, and 2.9% of all reads failed to map uniquely, respectively.
For comparison, we aligned the simulated reads independently to the reference and alternative genomes with the same parameters used when aligning reads to the single reference genome except that zero mismatches were allowed. This is analogous to aligning reads to the maternal and paternal genomes, which is a strategy that has previously been shown to produce unbiased measures of relative ASE [15, 18–20, 25]. We found that 99.0% of differentiating sites showed equal representation of the two alleles, with the rest showing no systematic bias toward either allele (Figure 2D). Only 1.9% of all reads were excluded because they failed to map uniquely to at least one genome.
Read length and the amount of sequence divergence can also affect allelic bias
Given the observed impact of neighboring differentiating sites on allelic assignments, we hypothesized that longer reads might produce less accurate measurements of allele-specific abundance because they should overlap more neighboring differentiating sites. To test this hypothesis, we repeated our simulation with 50-base reads, determining the maximum number of sites that differed between the two alleles among all possible 50-base reads overlapping each differentiating site. We found that 40.6%, 73.0%, and 88.9% of differentiating sites showed equal representation of the two alleles when aligned to a single reference genome with one, two or three mismatches allowed (Figure 2E-G). Increasing the number of mismatches allowed when aligning the 50-base sequence reads to be more similar to the ratio of mismatches allowed for the 36-base sequence reads eliminated this difference, however. 91.9% and 92.1% of differentiating sites showed equal allelic abundance for 36- and 50-base reads when three and four mismatches, respectively, were allowed (Additional file 3). By contrast, 98.8% of differentiating sites showed equal representation when reads were aligned to the maternal and paternal genomes with zero mismatches allowed (Figure 2H).
Increased sequence divergence is also expected to affect measures of relative allelic abundance because it should increase the average number of neighboring differentiating sites within each read. To test this hypothesis, we simulated 36-base reads from two different Drosophila species (D. melanogaster and D. simulans;) and analyzed them as described above, using the D. melanogaster exome as the single reference genome. Sequences from 60,040 orthologous exons with 1,130,435 differentiating sites were used for this simulation, which is an order of magnitude more differentiating sites than between the two strains of D. melanogaster analyzed. As predicted, we found that the bias toward the reference allele was higher for the interspecific comparison than for the intraspecific comparison when reads were aligned to a single reference genome (Figure 2, compare I-K with A-C). When aligning reads to both parental genomes, however, sequence divergence had a negligible impact: the intra- and interspecific datasets produced nearly identical results (Figure 2, compare L with D).
Allele-specific differences in mappability and insertions/deletions affect measurements of ASE
Differences between alleles in sequences that appear more than once in the genome can also cause reads to be excluded for one allele but not the other . Assuming the number of such differentiating sites is similar between alleles, differences in allele-specific mappability should not systematically favor one allele or the other, but will still cause errors in relative ASE. To examine the impact of mappability on measures of relative allelic abundance derived from our simulated data, we used software from the GEM library  to calculate a mappability score for each differentiating site by averaging the mappability scores of all possible reads that included that site. In each case, mappability scores were calculated using the same number of mismatches allowed during read alignment. Differentiating sites with an average mappability score < 1 were considered to have imperfect mappability when using a single reference genome. When using parental genomes, we summed the average mappability scores for each allele, and mappability scores < 2 were considered to have imperfect mappability.
Aligning real sequencing data to a single genome can produce reliable measures of relative ASE
Assessing the accuracy of relative ASE measurements derived from RNA-seq data is challenging because the true value of relative ASE is rarely known. Independent empirical methods for measuring relative ASE such as Pyrosequencing and qPCR can be used to validate RNA-seq data for individual genes, but they are not suitable for quantifying relative ASE on a genomic scale. Therefore, instead of using real RNA-seq data to evaluate factors affecting measures of relative ASE, we used sequence data that was collected in a comparable manner from genomic DNA extracted from F1 hybrids, in which all maternal and paternal alleles are expected to be present in equal amounts.
Specifically, we used 36-base reads from genomic DNA extracted from female F1 hybrids that were produced by crossing inbred strains of D. melanogaster and D. simulans. These strains had the same genotypes as the D. melanogaster and D. simulans sequences used for the interspecific simulation described above. Reads were aligned to the D. melanogaster exons allowing one, two, or three mismatches, as well as to both the D. melanogaster and D. simulans exons allowing zero mismatches. Because real sequencing data involves stochastic sampling, the proportion of the reference allele observed was not always expected to be 0.5. Therefore, after aligning reads, we excluded differentiating sites with fewer than 20 overlapping reads and used binomial exact tests with a false discovery rate threshold of 0.05 to test each differentiating site for a statistically significant difference in relative allelic abundance [15, 27].
Prior to excluding any sites, 70.4%, 88.9%, and 93.3%, respectively, of all differentiating sites showed equal allelic abundance when reads were aligned to a single genome with one, two, or three mismatches allowed. After aligning reads to both parental genomes, 96.9% showed evidence of equal allelic abundance. Excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed increased this percentage to 96.3%-96.6% when aligning to a single reference genome (Figure 5B). Further restricting the set of differentiating sites to those with perfect mappability increased these percentages ~0.1%, and subsequently excluding differentiating sites with indels nearby increased the percentage of genes with equal allelic abundance an additional ~0.1% (Figure 5B). After filtering out these problematic sites, measures of relative allelic abundance derived from aligning reads to a single reference genome were similar to those produced by aligning sequence reads separately to the maternal and paternal genomes (Figure 5C-E).
Excluding selected differentiating sites maintains ability to measure relative ASE for most exons
RNA-seq is a powerful tool for measuring ASE on a genomic scale; however, a systematic bias occurs when reads from a heterozygous individual are aligned to a single reference genome . We found that this systematic bias is predominantly caused by additional differentiating sites located near the focal differentiating site that interfere with read alignment. A similar bias toward the reference allele is caused by the presence of an indel near the focal differentiating site. Differences between alleles in mappability (i.e. the ability to align a read uniquely within the genome) also contribute to inaccuracy of ASE, but do not systematically favor one allele or the other across the genome.
Using both simulated and real sequencing data, we found that sites affected by the systematic bias toward the reference allele could be identified and excluded prior to estimating ASE based on the density of differentiating sites. The precise density at which neighboring differentiating sites became problematic depended on the number of mismatches allowed during the alignment of sequencing reads. After excluding these biased sites, as well as those affected by imperfect mappability and/or an indel(s) nearby, we found that RNA-seq data aligned to a single reference genome produced measures of relative ASE that were comparable to those resulting from separately aligning the same reads to allele-specific maternal and paternal genomes. Furthermore, we showed that excluding these problematic sites did not preclude measuring relative ASE for most exons, although the most rapidly evolving exons are expected to be preferentially eliminated. By identifying the specific factors causing erroneous measures of relative allele-specific expression reported in prior work and determining the relative impact of these factors on these measures, results from this study are expected to foster further improvements in methods for quantifying relative allele-specific expression.
Generating allele-specific short reads comparing D. melanogaster genotypes in silico
Simulating an allele-specific RNA-seq experiment requires variability to differentiate alleles and a set of clearly defined transcriptional units from which to generate allele-specific reads. Using data from the Drosophila Genetic Reference Panel (DGRP), we examined site-specific sequence information from a single highly-inbred line (“line_40”) isolated from an outbreeding population of Drosophila melanogaster. This specific line was chosen because it had the fewest sites with evidence of residual heterozygosity. Sequence information from this line was compared to the current build of the D. melanogaster genome (dm3), and sites that differed from this reference genome were retained as sites differentiating the dm3 and “line_40” alleles, referred to as the reference and alternative alleles, respectively.
Because RNA-seq experiments collect sequence information from the transcribed genome, we chose to generate reads from constitutive exons in D. melanogaster. These constitutive exons are defined as those present in all alternatively-spliced transcripts for a particular gene. We filtered out overlapping regions of exons located on opposite strands to avoid ambiguity. Starting from the 5’ end of each exon, we generated 36- and 50-base reads offset by a single base in the 3’ direction, for the reference and alternative alleles and in each strand orientation, creating a complete set of all possible allele-specific and strand-specific reads. This ensured that reads from each allele were present in equal abundance. Because the reference and alternative alleles differed only at these predefined differentiating sites, only reads overlapping these sites had the possibility to be informative for relative ASE.
Quantifying allelic abundance in simulated RNA-seq data
All alignments were performed using Bowtie v0.12.7 , requiring that reads align uniquely to the genome (bowtie -f -m 1 -v [0,1,2,3] --best). Alignments were processed using SAMtools v0.1.18  (samtools view -S -b -T; samtools sort; samtools mpileup -f), which generates site-specific allele frequencies using overlapping reads (read pileup). ASE was quantified using custom Perl and R scripts (available upon request), and any deviation from equal allelic abundance was considered allelic imbalance.
Initially, we aligned the simulated reads to the D. melanogaster (dm3) reference genome. Since reads generated from the alternative allele overlapping a differentiating site will have at least a single base mismatch to the reference genome, we successively allowed one (-v 1), two (-v 2), or three (-v 3) mismatches, but still required unique alignment to the reference genome (-m 1). Although the -v parameter assesses mismatches for the length of the entire read, and has an upper limit of three, an alternative parameter -n allows additional mismatches outside of a specified region at the beginning of each read, called a seed. To allow a fourth mismatch for the 50-base reads, we specified a 36-base seed region with up to three mismatches and increased the maximum sum of mismatch quality scores across the entire read to 161, since base quality scores for FASTA reads are assumed to equal 40 (bowtie -f -n 3 -e 161 -l 36 -m 1 --best). After each alignment was performed, we considered only reads overlapping the previously defined differentiating sites. We then quantified relative allelic abundance by determining whether or not each overlapping read at these sites matched the reference or the alternative alleles. These summed counts represented our measures of relative allelic abundance at each differentiating site.
Next, we aligned the same allele-specific reads independently to the aforementioned reference genome and the edited copy of the reference genome representing the alternative allele (bowtie -f -m 1 -v 0 --best). As described above, this alternative genome was obtained by editing the bases at differentiating sites to match the fixed genotypes from the DGRP “line_40” sequencing data. No mismatches were allowed when aligning simulated reads to either allele-specific genome. This allowed us to determine, for any read, whether or not it aligned uniquely to one or the other allele-specific genome. We posited that reads aligning uniquely to one or the other allele-specific genome was evidence that that read was allele-specific, while reads aligning equally well to both genomes was not. To measure relative ASE at each differentiating site, we counted the number of reads overlapping differentiating sites that aligned uniquely to only one of the allele-specific genomes and summed these counts for each allele.
Measuring number of neighboring differentiating sites and mappability across genomes
After quantifying allelic abundance at each differentiating site, we calculated the maximum number of other sites showing differences between alleles contained within any of the possible k-base reads, where k = simulated read length (either 36- or 50-bases). For each genome, we used the GEM-mappability tool from the GEM library build 475  to measure genome mappability, or the ability for a read from a particular location to uniquely align to a genome. For the simulated and real data, we measured mappability for the appropriate read length (either 36 or 50 bases), allowing zero, one, two, or three mismatches, with default parameters (gem-mappability -l [36,50] -m [0,1,2,3]). Mappability for individual sites was calculated using the reciprocal frequency of the number of locations a read beginning at that site would align to in the genome. To calculate mappability scores for differentiating sites, we averaged mappability for all read positions that overlapped each differentiating site .
Quantifying relative ASE in an F1 hybrid between D. melanogaster and D. simulans
To assess the accuracy of allele-specific abundance inferred from real sequencing data, we used published 36-base Illumina reads from genomic DNA extracted from a pool of female F1 hybrids between laboratory strains of D. melanogaster and D. simulans (Berlin: BDSC 8522 and C167.4: BDSC 4736, respectively; ). We restricted our analysis to the first mate of this set of paired-end reads, combining reads from all three technical replicates. We used the custom set of 60,040 orthologous exon sequences (exomes) between D. melanogaster and D. simulans developed in Graze et al.  for the reference and alternative genomes. We also used these sequences to simulate and analyze 36-base reads comparing D. melanogaster and D. simulans alleles in the same manner outlined above for the two D. melanogaster genotypes.
We first performed a pairwise alignment for each orthologous pair of exons using the Fast Statistical Alignment v1.15.7 software  with default parameters (fsa --stockholm). We used custom Perl scripts to identify 1,130,435 sites that could differentiate these two alleles as well as to identify regions of the exome present in one allele but not the other (indels).
We then aligned the Illumina reads to the D. melanogaster exome, requiring unique alignment to a single location and allowing one, two, or three mismatches. We also aligned the same reads independently to the D. melanogaster- and D. simulans-specific exomes, masking indels identified by the pairwise alignments. After each of these alignments, we quantified ASE, measured the density of differentiating sites, and determined the mappability to each genome using the same strategies described above for the simulated data. We performed binomial exact tests for differentiating sites with 20 or more overlapping reads, controlling the false discovery rate at 0.05 to correct for multiple comparisons.
Drosophila Genetic Reference Panel
Quantitative polymerase chain reaction.
We thank members of the Wittkopp laboratory, especially Richard Lusk, Brian Metzger, Fabien Duveau, and Bing Yang, for helpful discussions and comments on the manuscript. This work was supported by a grant from the National Science Foundation (MCB-1021398) to PJW, a postdoctoral fellowship from the National Institutes of Health (1F32-GM089009-01A1) to JDC, and a position on National Science Foundation Training Grant No. 0903629 for KRS.
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet. 2010, 11: 31-46. 10.1038/nrg2626.View ArticlePubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, Albert FW, Zeller U, Khaitovich P, Grützner F, Bergmann S, Nielsen R, Pääbo S, Kaessmann H: The evolution of gene expression levels in mammalian organs. Nature. 2011, 478: 343-348. 10.1038/nature10532.View ArticlePubMedGoogle Scholar
- Wittkopp PJ, Haerum BK, Clark AG: Evolutionary changes in cis and trans gene regulation. Nature. 2004, 430: 85-88. 10.1038/nature02698.View ArticlePubMedGoogle Scholar
- Cowles CR, Hirschhorn JN, Altshuler D, Lander ES: Detection of regulatory variation in mouse genes. Nat Genet. 2002, 32: 432-437. 10.1038/ng992.View ArticlePubMedGoogle Scholar
- Coolon JD, Stevenson KR, McManus CJ, Graveley BR, Wittkopp PJ: Genomic imprinting absent in Drosophila melanogaster adult females. Cell Rep. 2012, 2: 69-75. 10.1016/j.celrep.2012.06.013.PubMed CentralView ArticlePubMedGoogle Scholar
- DeVeale B, van der Kooy D, Babak T: Critical evaluation of imprinted gene expression by RNA-Seq: a new perspective. PLoS Genet. 2012, 8: e1002600-10.1371/journal.pgen.1002600.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010, 26: 589-595. 10.1093/bioinformatics/btp698.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-1858. 10.1101/gr.078212.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMedGoogle Scholar
- Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, Pritchard JK: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009, 25: 3207-3212. 10.1093/bioinformatics/btp579.PubMed CentralView ArticlePubMedGoogle Scholar
- McManus CJ, Coolon JD, Duff MO, Eipper-Mains J, Graveley BR, Wittkopp PJ: Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 2010, 20: 816-825. 10.1101/gr.102491.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Graze RM, Novelo LL, Amin V, Fear JM, Casella G, Nuzhdin SV, McIntyre LM: Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol. 2012, 29: 1521-1532. 10.1093/molbev/msr318.PubMed CentralView ArticlePubMedGoogle Scholar
- Shen Y, Garcia T, Pabuwal V, Boswell M, Pasquali A, Beldorth I, Warren W, Schartl M, Cresko WA, Walter RB: Alternative strategies for development of a reference transcriptome for quantification of allele specific expression in organisms having sparse genomic resources. Comp Biochem Physiol Part D Genomics Proteomics. 2013, 8: 11-16. 10.1016/j.cbd.2012.10.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N, Bhardwaj N, Rubin M, Snyder M, Gerstein M: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011, 7: 522-PubMed CentralView ArticlePubMedGoogle Scholar
- Rivas-Astroza M, Xie D, Cao X, Zhong S: Mapping personal functional data to personal genomes. Bioinformatics. 2011, 27: 3427-3429. 10.1093/bioinformatics/btr578.PubMed CentralView ArticlePubMedGoogle Scholar
- Satya RV, Zavaljevski N, Reifman J: A new strategy to reduce allelic bias in RNA-Seq readmapping. Nucleic Acids Res. 2012, 40: e127-10.1093/nar/gks425.View ArticlePubMedGoogle Scholar
- Wu TD, Nacu S: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010, 26: 873-881. 10.1093/bioinformatics/btq057.PubMed CentralView ArticlePubMedGoogle Scholar
- Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM: A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res. 2011, 21: 1728-1737. 10.1101/gr.119784.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Ayroles JF, Carbone MA, Stone EA, Jordan KW, Lyman RF, Magwire MM, Rollmann SM, Duncan LH, Lawrence F, Anholt RRH, Mackay TFC: Systems genetics of complex traits in Drosophila melanogaster. Nat Genet. 2009, 41: 299-307. 10.1038/ng.332.PubMed CentralView ArticlePubMedGoogle Scholar
- Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, Casillas S, Han Y, Magwire MM, Cridland JM, Richardson MF, Anholt RRH, Barrón M, Bess C, Blankenburg KP, Carbone MA, Castellano D, Chaboub L, Duncan L, Harris Z, Javaid M, Jayaseelan JC, Jhangiani SN, Jordan KW, Lara F, Lawrence F, Lee SL, Librado P, Linheiro RS, Lyman RF: The Drosophila melanogaster Genetic Reference Panel. Nature. 2012, 482: 173-178. 10.1038/nature10811.PubMed CentralView ArticlePubMedGoogle Scholar
- Graze RM, McIntyre LM, Main BJ, Wayne ML, Nuzhdin SV: Regulatory divergence in Drosophila melanogaster and D. simulans, a genomewide analysis of allele-specific expression. Genetics. 2009, 183: 547-561. 10.1534/genetics.109.105957. 61– 1SI–21SIPubMed CentralView ArticlePubMedGoogle Scholar
- Derrien T, Estellé J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P: Fast computation and applications of genome mappability. PLoS ONE. 2012, 7: e30377-10.1371/journal.pone.0030377.PubMed CentralView ArticlePubMedGoogle Scholar
- Fontanillas P, Landry CR, Wittkopp PJ, Russ C, Gruber JD, Nusbaum C, Hartl DL: Key considerations for measuring allelic expression on a genomic scale using high-throughput sequencing. Mol Ecol. 2010, 19 (1): 212-227.PubMed CentralView ArticlePubMedGoogle Scholar
- Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L: Fast statistical alignment. PLoS Comput Biol. 2009, 5: e1000392-10.1371/journal.pcbi.1000392.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.