Analysis of discordant Affymetrix probesets casts serious doubt on idea of microarray data reutilization
© Marakhonov et al.; licensee BioMed Central Ltd. 2014
Published: 19 December 2014
Affymetrix microarray technology allows one to investigate expression of thousands of genes simultaneously upon a variety of conditions. In a popular U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discordant expression observed for two different probesets that match the same gene is a widespread phenomenon which is usually underestimated, ignored or disregarded.
Here we evaluate the prevalence of discordant expression in data collected using Affymetrix HG-U133A microarray platform. In U133A, about 30% of genes annotated by two different probesets demonstrate a substantial correlation between independently measured expression values. To our surprise, sorting the probesets according to the nature of the discrepancy in their expression levels allowed the classification of the respective genes according to their fundamental functional properties, including observed enrichment by tissue-specific transcripts and alternatively spliced variants. On another hand, an absence of discrepancies in probesets that simultaneously match several different genes allowed us to pinpoint non-expressed pseudogenes and gene groups with highly correlated expression patterns. Nevertheless, in many cases, the nature of discordant expression of two probesets that match the same transcript remains unexplained. It is possible that these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene.
The majority of absolute gene expression values collected using Affymetrix microarrays may not be suitable for typical interpretative downstream analysis.
Currently, the studies of transcriptomic landscapes in various organisms and their tissues are performed either using RNAseq or by microarrays. Even now, the latter remain the most popular and cost efficient approach for transcript profiling that is afforded by many laboratories. In particular, the most widely used microarray platform Affymetrix HG-U133A expression microarray alone provided about 1.5 millions of GEO datasets depositions available for re-analysis.
For each gene, Affymetrix arrays employ a collection of 11 to 20 very short probes; the signals from each of these probes are aggregated into a probeset-level signal. HG-U133A v2 chips include 22 283 probesets, where each probe is represented by 25 nucleotides string that matches a particular mRNA or a group of alternatively spliced RNAs. In order to minimize non-specific noise, each probeset contains not only perfectly matched probes but also mismatched ones, whose hybridization intensities are taken into account when finalized gene expression values are generated . In HG-U133A microarray platform, the expression of 37% of genes is measured by more than one probeset. The discrepancies in expression of individual probesets that belong to the same gene are well-known and widespread phenomenon. Commonly, independent probeset values are averaged, and the discrepancies are underestimated, ignored or disregarded [2, 3]. While the discrepancy in expression values collected in independent sets of microarray experiments may be explained by the difference in technicalities of background subtraction and normalization , the mismatching values collected by using two simultaneously hybridized probesets assumed to target the same transcript are more difficult to dismiss. That is why the correct annotation of probesets remains an unsolved problem.
Several attempts have been made in order to re-annotate microarrays and/or to improve data analysis workflow [5–7], with the main idea of updating an annotation of probesets by their remapping to unique target sequences, cleaning out the repeats, evaluating strand orientation  or analyzing the patterns of cross- and bulk-hybridization of probesets in accordance to the position of each probe on the target sequence, its GC-content, and the presence of common sequence variants [9–13]. Typically, adding the clean-out or other processing steps results in filtering out unreliable probes and/or entire probesets, thus, limiting the number of genes that may be properly analyzed in a given experiment. On the plus side, the clean-up procedures may significantly enhance the reliability of interpretation [14, 15]. Sadly, these innovative steps are commonly ignored by typical microarray data processing algorithms that often contribute to either incorrect or suboptimal interpretation of expression data.
Present study aims to investigate the nature of expression value discrepancies observed for two probesets annotated to same gene.
Discrepancies in expression values obtained using different probesets mapped to the same gene
These findings set us to find out how frequent are the expression discrepancies of this kind and to identify all reliable probeset pairs that allow extraction of similar or identical profiles. We surmised that this approach may allow us to extract a subset of expression values that correspond to true expression levels for at least a few human genes.
Reannotation of probesets and formation of gene groups
Observations described above point that expression data obtained using single probesets may not be reliable. In the same time, the comparison of expression profiles of genes annotated by two or more probesets may serve as an internal control for validation of overall result reliability in a given microarray-based experiment.
Correlation analysis of expression data for genes covered by two probesets
In order to identify Affymetrix probesets consistent in measuring expression of their target gene(s), an analysis of correlations was performed. Since the distribution of probeset-specific expression levels across human tissues represented in GSE1133 dataset  was not normal (Shapiro-Wilk test, p < 0.05), non-parametric Spearman's rank correlation coefficients ρ were calculated. To take into account that GSE1133 includes profiles for a considerable number of tissues (N = 42), we have also computed correlations using parametric Pearson's product-moment coefficient r which operates with absolute values of expression and is more sensitive to presence of outliers than Spearman procedure. The relative value of calculating both Spearman's and Pearson's of correlation coefficients could be illustrated using RIPK2 gene as an example (see Figure 1). For expression values extracted using two probesets that match to this gene, 209544_at and 209545_s_at, Pearson's correlation is 0.148, while Spearman's rank correlation is 0.144. Both types of statistics show that the correlation between expression values extracted using two different probesets is rather small (see Figure 1). However, not every gene behaved that consistently when the results of parametric and non-parametric correlation analysis were compared. In fact, this approach allowed us to differentiate human genes and gene groups into categories with specific biological properties.
Detailed analysis of genes and gene groups that fell into extreme behavior categories
The typical way of explaining expression level discrepancies is to blame them on technical problems or uneven hybridization conditions across the chip . In this study, we attempted a search for possible biological correlates that may explain incongruency of expression profiles detected by two probesets that match to the same gene.
Among four extreme categories of genes depicted on Figure 4, the first set (N = 972, or about 35.2% of genes covered by two different probesets) is the most easy to understand and accept. In this category, the correlation between expression values obtained using two different probesets and measured using both Spearman's and Pearson's procedures exceeds 0.7; let's call these genes "reliably profiled". One may extrapolate these data onto entire microarray chip, including the majority of its genes that are covered by only one probeset (N = 8028), and conclude that overall reliability of Affymetrix HG-U133A platform is about 35%. In other words, we could assert that expression profiles of approximately 35% of genes represented at this chip are measured in a reliable way.
In three other extreme categories (N = 287, or 10.4% of genes covered by two different probesets), either one or both correlation coefficients were low (< 0.3). Below we attempted to find biological, rather than technical explanation to this observation.
There is no easy explanation for the discordant expression values observed for genes or gene groups with low correlation values revealed by both Spearman's and Pearson's coefficient (N = 195, or 7.1% of genes covered by two different probesets). A very good example of this kind is a RIPK2 gene (r = 0.148, ρ = 0.144) described at the Figure 1. The remarkable discrepancy in the expression patterns derived from two probesets that match to the same gene may be due either to technical or methodological problems, but also to some unknown biological reasons.
An influence of the quality of probes alignment on the concordance of expression patterns
To find out whether misalignment of individual probes that comprise particular probesets could cause the discrepancy in the expression patterns, we downloaded and explored genomic alignments of individual probes mapped to human genome in PLANdbAffy database . In his paper, Nurtdinov et al. classified all Affymetrix probes into four classes and assigned a color to each class. In his classification, the "green" probes are most reliable; these probes satisfy the following three conditions: (i) the probe is aligned to the target gene with no mismatches, (ii) there are no matches of the probe to other genes and (iii) there are no perfect alignments of the probe to any non-coding region. The "yellow" probes match the criteria (i) and (ii) but not (iii). The "red" probes are the perfect match to the target gene and to at least one other gene with no more than one mismatch. Finally, the "black" probes are aligned to the target gene with at least one mismatch . We adopted color-based classification of the probes described above to augment our own reannotated file that described probesets that comprise Affymetrix HG-U133A microarray with percentages of individual probes that belong to each PLANdbAffy color class  (see Additional file 2 Supplementary Table S1). Notably, both probesets to RIPK2 gene (Figure 1) are almost exclusively comprised of "green" probes and nearly perfectly map its cognate gene. The exclusion being the probe 209545_s_at_5 that matches RIPK2 gene in its 20 out of 25 nucleotides, thus, being a "black" probe.
In its formidable effort, Nurtdinov et al. comprised the database that hosts the genomic alignments for all individual Affymetrix probes, while making no attempt to find out whether the quality of the probes may affect the accuracy of expression profiling. In fact, it seems logical to conclude that the quality of the probes, indeed, defines the quality of microarray output. However, in present work, we demonstrate that this is not the case.
Another reason for higher correlation between gene expression profiles derived from the hybridization patterns of "red" probesets could be the alignment of these probes to a whole family of conserved paralogs. A good example of such situation is a highly conserved family of NOMO1///NOMO2///NOMO3 genes with an average identity between transcripts of 95-100%. NOMO genes originate from a genomic duplication at least 78 Mb in size that took place in 16p12.3-p13.1. Both NOMO-specific probesets, 217225_x_at and 221853_s_at, are annotated to all NOMO genes and, therefore, fall into "red" category. High degree of correlation between gene expression profiled derived from the hybridization patterns of those probesets (r = 0.958, ρ = 0.962) may be explained by similarly high correlation of expression profiles of all NOMO genes that share their regulatory features that were duplicated all together with the genes itself.
Is the derivation of real-life tissue expression patterns from microarray data possible at all?
To tackle this Holy Grail problem, for each gene covered by two different probesets (N = 2761) we compared the means (that correspond to the average expression level of a particular gene between different tissues) and coefficients of variation (that reflect the variance of expression for a particular gene between different tissues and could indicate whether certain gene is ubiquitously expressed or not) between both probesets. This type of analysis was previously executed for genes covered by three or more probesets in ; in that study Jaksik et al. suggested similarly designed search for the outliers among probesets annotated to same gene with subsequent elimination of such a probe set from further analysis. However, this approach is applicable with confidence only to genes annotated with more than two probesets. Here we took approach of Jaksik et al. a bit further, by suggesting that we may trust that two independent gene-specific probesets to report correct expression profiles if these probesets show comparable means and coefficients of variation. In other words, even if expression profiles derived from hybridization patterns of these probesets demonstrate lower than expected correlation to each other, we may hope for a salvation by averaging the values produced by each probeset, and treating the cumulative value as true expression value for given gene; thus, we would justify the procedure that is typically applied in garden-variety microarray analysis pipelines, especially in gene-based approaches like coexpression network analysis and gene set enrichment analysis [32–35].
Even advent of RNAseq cannot defy current reality: expression microarrays remain an important tool that allows one to discern gene expression profiles at a genome scale. Moreover, the adoption of the Minimum Information About a Microarray Experiment (MIAME) standard and the establishment of public repositories for microarray data, especially Gene Expression Omnibus (GEO) and ArrayExpress set the stage for gene expression data sharing and reuse (see  for review). Intuitively, due to stoichiometric nature of sequences hybridization, the signal rendering should be directly proportional to the concentrations of cognate RNA molecules in tested samples. Moreover, the design of expression microarrays, with several individual probes corresponding to each single gene and comprising the "probeset" should guarantee intrinsic robustness of the detection and the consistency of the expression profiles produced. Nevertheless, in the vast majority of experiment, different probesets corresponding to the same gene differ in the levels of the signal they generate, thus, demonstrating discordant expression profiles. This raises the question of whether expression microarray data are reliable at all. This is not an idle question, as microarrays are used not only in research labs, but also in the process of the discovery of diagnostic biomarkers and in the screening for novel medical drugs.
An initial goal of our study was to develop the criteria for selection of the most reliable probesets. Instead, in process of this analysis, the study focus was shifted to investigation of overall reliability of the data we could possibly obtain from expression microarray. In this work, we evaluated the outputs of widely used Affymetrix HG-U133A platform. To compare the signals produced by various probesets that comprise this array, we selected a dataset that profiled expression levels in 84 different human tissues. First, in order to identify possible biological reasons that could cause the discrepancy of expression profiles obtained using two probesets that map to the same gene, we performed correlation analysis of these expression profiles and found that the these correlations aid in classifying the genes represented by two probesets into distinct functional subgroups, including one enriched by genes expressed only in specific tissue(s) and another that maps onto alternative transcripts produced by the same locus. Moreover, we have found that the analysis of expression patterns revealed by the pairs of same-gene probesets aligned both to the gene and its pseudogene(s) may help to differentiate between expressed and non-expressed pseudogenes.
In further attempt to confirm commonly discussed notion that the "quality" of the probes that comprise probesets directly affect the expression data outputs, we have also analyzed the locations of individual probes and their quality classes. Contrary to our expectations, we could not confirm the relationship between probe sets "quality" and the concordance of expression profiles obtained with the same-gene probesets matched by their quality. Hence, commonly observed differences in expression profiles obtained with different probesets that match the same gene are not due to low quality of the probesets mapping but to something else. Most commonly cited reason for such inadequate results is the technical errors [19, 37].
Finally, in order to investigate the reliability of expression signals obtained with different probesets that match the same gene we have compared the means and coefficients of variation of expression values obtained with these probesets. To our surprise, we observed that only genes that display almost perfect correlation between two expression profiles that correspond to these probesets (ρ ≥ 0.9, Figure 11) display comparable distributions of absolute expression values produced by both probesets. The subgroups of genes with lower degrees of correlation between expression profiles show significantly lower consistency in terms of absolute values of expression. Most likely explanation to this phenomenon is that, indeed, these probesets report differently regulated sets of transcripts, or, in best case scenario, two different sets of transcripts that represent the same gene. This observation indicates that at least 65% of the absolute gene expression values collected using Affymetrix microarrays cannot be utilized for typical interpretative downstream analysis.
MIAME 2.0 project called for reutilization of microarray data to produce more relevant and robust results. Unfortunately, our study led us to the conclusion that even the most reliable probeset-based expression microrrays fail to produce accurate reflection of the expression profile for individual transcripts. Hence, the reutilisitaion of microarray datasets may be possible only through analyzing individual probes or cleaned up sets of probes [38, 39] that went through extensive validation by both genome alignments and by the dissection of reference sets of microarray profiles.
Materials and methods
The test Affymetrix HG U133A dataset GSE1133 was downloaded from BioGPS database . The Affymetrix HG-U133A platform was selected because of its popularity and its design centered on annotated probesets that match validated human genes .
Reannotation of Affymetrix HG U133A was performed in two steps that started with Affymetrix annotation build 32 (#%netaffx-annotation-netaffx-build = 32). General overview of the reannotation process is described in the Additional file 1 Supplementary Figure S1. First, the probesets annotated to same gene were grouped together. In example presented at the Additional file 1 Supplementary Figure S1, probesets 2 and 3 were classified as Group B because they both were annotated to gene B. Even in original Affymetrix annotantion, a total of 583 probesets annotated to several genes simultaneously and denoted as C///D where C and D are two different genes (see probeset 4 at Additional file 1 Supplementary Figure S1 as an example). For convenience of the following analyses, when the probesets were defining two or more genes at the same time, this group of genes was classified as novel, combinatorial expression group. Sometimes these two genes could be also covered by additional, truly differentiating probesets, such as probesets 5 and 6 in our example. In these cases, each of the genes was assigned to its own gene label that could be profiled only by differenting probeset, but not by the probeset that could be used to profile two or more genes at the same time. For each gene, or gene group, the amount of differentiating probesets was calculated.
During second step of reannotation, all probesets were combined into 'gene groups' in a following manner: C///D ∪ C ≡ C///D (the union of C///D and C is defined as C///D gene group). According to this procedure, probe sets 4, 5 and 6 were grouped in gene group 'C///D' (see Additional file 1 Supplementary Figure S1). Notably, original attribution of each probeset to gene was preserved throughout the reannotation procedure. Thus, annotation information about each probeset was only updated and extended.
In each individual probeset, all probes were individually aligned and mapped to human genome using PLANdbAffy database and custom track in UCSC Genome Browser .
Microsoft Excel 2010 (Microsoft Corporation) was used to perform the correlation analysis. Other statistics were computed using R .
The work was supported in part by grant of the Ministry of Education and Science of Russia for the young postdoctoral scientists for AM (contract No 8589 dated 09.24.2012, application # 2012-1.3.1-12-000-1001-028) and by the RF President's grant for young postdoctoral scientists for AM (No MK-5249.2012.4) and by a Dynasty Foundation Fellowship to IA (No DP-B-26/14). A. Marakhonov is thankful to research team at Research Centre for Medical Genetics for discussions and support of the work.
Publication of this article has been funded by the BGRS\SB-2014 Organizing Committee.
This article has been published as part of BMC Genomics Volume 15 Supplement 12, 2014: Selected articles from the IX International Conference on the Bioinformatics of Genome Regulation and Structure\Systems Biology (BGRS\SB-2014): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S12.
- Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ: High density synthetic oligonucleotide arrays. Nat Genet. 1999, 21: 20-24.PubMedView ArticleGoogle Scholar
- Stalteri MA, Harrison AP: Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinformatics. 2007, 8: 13-10.1186/1471-2105-8-13.PubMedPubMed CentralView ArticleGoogle Scholar
- Orlov YL, Zhou J, Lipovich L, Shahab A, Kuznetsov VA: Quality assessment of the Affymetrix U133A&B probesets by target sequence mapping and expression data analysis. In silico biology. 2007, 7: 241-260.PubMedGoogle Scholar
- Harrison AP, Johnston CE, Orengo CA: Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips. BMC Bioinformatics. 2007, 8: 195-10.1186/1471-2105-8-195.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen R, Li L, Butte AJ: AILUN: reannotating gene expression data automatically. Nat Methods. 2007, 4: 879-10.1038/nmeth1107-879.PubMedPubMed CentralView ArticleGoogle Scholar
- Ballester B, Johnson N, Proctor G, Flicek P: Consistent annotation of gene expression arrays. BMC Genomics. 2010, 11: 294-10.1186/1471-2164-11-294.PubMedPubMed CentralView ArticleGoogle Scholar
- Li Q, Birkbak NJ, Gyorffy B, Szallasi Z, Eklund AC: Jetset: selecting the optimal microarray probe set to represent a gene. BMC Bioinformatics. 2011, 12: 474-10.1186/1471-2105-12-474.PubMedPubMed CentralView ArticleGoogle Scholar
- Orlov Y, Zhou J, Chen J, Shahab A, Kuznetsov V: APMA Database for Affymetrix Target Sequences Mapping, Quality Assessment and Expression Data Mining. Pattern Recognition in Bioinformatics. Edited by: Rajapakse JC, Schmidt B, Volkert G. 2007, Springer Berlin Heidelberg, 4774: 166-177. 10.1007/978-3-540-75286-8_17. [Hutchison D, Kanade T, Kittler J, Kleinberg JM, Kobsa A, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Rangan CP, et al (Series Editors) Lecture Notes in Computer Science]View ArticleGoogle Scholar
- Lahti L, Elo LL, Aittokallio T, Kaski S: Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. IEEE/ACM transactions on computational biology and bioinformatics/IEEE, ACM. 2011, 8: 217-225. 10.1109/TCBB.2009.38.PubMedView ArticleGoogle Scholar
- Eklund AC, Friis P, Wernersson R, Szallasi Z: Optimization of the BLASTN substitution matrix for prediction of non-specific DNA microarray hybridization. Nucleic Acids Res. 2010, 38: e27-10.1093/nar/gkp1116.PubMedPubMed CentralView ArticleGoogle Scholar
- Memon FN, Owen AM, Sanchez-Graillet O, Upton GJ, Harrison AP: Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing. Journal of integrative bioinformatics. 2010, 7: 111-PubMedGoogle Scholar
- Langdon WB: Correlation of microarray probes give evidence for mycoplasma contamination in human studies. Proceedings of the 15th annual conference companion on Genetic and evolutionary computation. 2013, Amsterdam, The Netherlands. ACM, 1447-1454. 10.1145/2464576.2482725.View ArticleGoogle Scholar
- Sanchez-Graillet O, Rowsell J, Langdon WB, Stalteri M, Arteaga-Salas JM, Upton GJ, Harrison AP: Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips. Journal of integrative bioinformatics. 2008, 5: 98-10.2390/biecoll-jib-2008-98.Google Scholar
- Li H, Zhu D, Cook M: A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data. BMC Genomics. 2008, 9: 188-10.1186/1471-2164-9-188.PubMedPubMed CentralView ArticleGoogle Scholar
- Liao Q, Liu C, Yuan X, Kang S, Miao R, Xiao H, Zhao G, Luo H, Bu D, Zhao H, et al: Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network. Nucleic Acids Res. 2011, 39: 3864-3878. 10.1093/nar/gkq1348.PubMedPubMed CentralView ArticleGoogle Scholar
- Magalhaes JG, Lee J, Geddes K, Rubino S, Philpott DJ, Girardin SE: Essential role of Rip2 in the modulation of innate and adaptive immunity triggered by Nod1 and Nod2 ligands. European journal of immunology. 2011, 41: 1445-1455. 10.1002/eji.201040827.PubMedView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, et al: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, Su AI: BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 2009, 10: R130-10.1186/gb-2009-10-11-r130.PubMedPubMed CentralView ArticleGoogle Scholar
- Upton GJ, Sanchez-Graillet O, Rowsell J, Arteaga-Salas JM, Graham NS, Stalteri MA, Memon FN, May ST, Harrison AP: On the causes of outliers in Affymetrix GeneChip data. Briefings in functional genomics & proteomics. 2009, 8: 199-212. 10.1093/bfgp/elp027.View ArticleGoogle Scholar
- Koury S, Yarlagadda S, Moskalik-Liermo K, Popli N, Kim N, Apolito C, Peterson A, Zhang X, Zu P, Tamburlin J, Bofinger D: Differential gene expression during terminal erythroid differentiation. Genomics. 2007, 90: 574-582. 10.1016/j.ygeno.2007.06.010.PubMedPubMed CentralView ArticleGoogle Scholar
- Peng YM, van de Garde MD, Cheng KF, Baars PA, Remmerswaal EB, van Lier RA, Mackay CR, Lin HH, Hamann J: Specific expression of GPR56 by human cytotoxic lymphocytes. Journal of leukocyte biology. 2011, 90: 735-740. 10.1189/jlb.0211092.PubMedView ArticleGoogle Scholar
- Liu M, Parker RM, Darby K, Eyre HJ, Copeland NG, Crawford J, Gilbert DJ, Sutherland GR, Jenkins NA, Herzog H: GPR56, a novel secretin-like human G-protein-coupled receptor gene. Genomics. 1999, 55: 296-305. 10.1006/geno.1998.5644.PubMedView ArticleGoogle Scholar
- Meding S, Balluff B, Elsner M, Schone C, Rauser S, Nitsche U, Maak M, Schafer A, Hauck SM, Ueffing M, et al: Tissue-based proteomics reveals FXYD3, S100A11 and GSTM3 as novel markers for regional lymph node metastasis in colon cancer. The Journal of pathology. 2012, 10.1002/path.4021.Google Scholar
- Nurtdinov RN, Vasiliev MO, Ershova AS, Lossev IS, Karyagina AS: PLANdbAffy: probe-level annotation database for Affymetrix expression microarrays. Nucleic Acids Res. 2010, 38: D726-730. 10.1093/nar/gkp969.PubMedPubMed CentralView ArticleGoogle Scholar
- Fu ZD, Csanaky IL, Klaassen CD: Effects of aging on mRNA profiles for drug-metabolizing enzymes and transporters in livers of male and female mice. Drug metabolism and disposition: the biological fate of chemicals. 2012, 40: 1216-1225. 10.1124/dmd.111.044461.View ArticleGoogle Scholar
- Yamazaki H, Shimizu M: Survey of variants of human flavin-containing monooxygenase 3 (FMO3) and their drug oxidation activities. Biochemical pharmacology. 2013, 85: 1588-1593. 10.1016/j.bcp.2013.03.020.PubMedView ArticleGoogle Scholar
- Mercer TR, Wilhelm D, Dinger ME, Solda G, Korbie DJ, Glazov EA, Truong V, Schwenke M, Simons C, Matthaei KI, et al: Expression of distinct RNAs from 3' untranslated regions. Nucleic Acids Res. 2011, 39: 2393-2403. 10.1093/nar/gkq1158.PubMedPubMed CentralView ArticleGoogle Scholar
- Whitelaw CM, Robinson JE, Chambers GB, Hastie P, Padmanabhan V, Thompson RC, Evans NP: Expression of mRNA for galanin, galanin-like peptide and galanin receptors 1-3 in the ovine hypothalamus and pituitary gland: effects of age and gender. Reproduction. 2009, 137: 141-150. 10.1530/REP-08-0266.PubMedView ArticleGoogle Scholar
- Mitra P, Vaughan PS, Stein JL, Stein GS, van Wijnen AJ: Purification and functional analysis of a novel leucine-zipper/nucleotide-fold protein, BZAP45, stimulating cell cycle regulated histone H4 gene transcription. Biochemistry. 2001, 40: 10693-10699. 10.1021/bi010529o.PubMedView ArticleGoogle Scholar
- Ding W, Lin L, Chen B, Dai J: L1 elements, processed pseudogenes and retrogenes in mammalian genomes. IUBMB life. 2006, 58: 677-685. 10.1080/15216540601034856.PubMedView ArticleGoogle Scholar
- Jaksik R, Polanska J, Herok R, Rzeszowska-Wolny J: Calculation of reliable transcript levels of annotated genes on the basis of multiple probe-sets in Affymetrix microarrays. Acta biochimica Polonica. 2009, 56: 271-277.PubMedGoogle Scholar
- Zhang J, Finney RP, Clifford RJ, Derr LK, Buetow KH: Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. Genomics. 2005, 85: 297-308. 10.1016/j.ygeno.2004.11.004.PubMedView ArticleGoogle Scholar
- Ivliev AE, t Hoen PA, Sergeeva MG: Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res. 2010, 70: 10060-10070. 10.1158/0008-5472.CAN-10-2465.PubMedView ArticleGoogle Scholar
- Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH: Functional organization of the transcriptome in human brain. Nature neuroscience. 2008, 11: 1271-1282. 10.1038/nn.2207.PubMedPubMed CentralView ArticleGoogle Scholar
- Ostlund G, Sonnhammer EL: Avoiding pitfalls in gene (co)expression meta-analysis. Genomics. 2014, 103: 21-30. 10.1016/j.ygeno.2013.10.006.PubMedView ArticleGoogle Scholar
- Rung J, Brazma A: Reuse of public genome-wide gene expression data. Nat Rev Genet. 2013, 14: 89-99.PubMedView ArticleGoogle Scholar
- Langdon WB, Upton GJ, da Silva Camargo R, Harrison AP: A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. IEEE/ACM transactions on computational biology and bioinformatics/IEEE, ACM. 2010, 7: 647-653. 10.1109/TCBB.2008.108.PubMedView ArticleGoogle Scholar
- Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, et al: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005, 33: e175-10.1093/nar/gni179.PubMedPubMed CentralView ArticleGoogle Scholar
- Sandberg R, Larsson O: Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinformatics. 2007, 8: 48-10.1186/1471-2105-8-48.PubMedPubMed CentralView ArticleGoogle Scholar
- R Core Team: R: A Language and Environment for Statistical Computing. 2014, R Foundation for Statistical Computing, Vienna, Austria, Available online at http://www.R-project.org/.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.