From: Correspondence regarding "Effect of active smoking on the human bronchial epithelium transcriptome"
Flaw | Consequence |
---|---|
poor definition of "preferential" expression | introduces unchecked bias from different group sizes |
incorrect use of Venn diagram | confounds overall sense of group-specific differences |
use of raw tag counts to determine "preferential" expression | introduces unchecked bias from different library sizes |
data filtered using criteria that includes variable to be tested | pre-selects for data more likely to be found significant, confounding estimated of false discovery rate (FDR) |
significance threshold set to p ≤ 0.05 without adjusting for multiple testing | false discovery rate (FDR) could be very high |
"significant" results undergo post hoc fold-change filter | low tag counts more likely to pass the filter, yet these more likely to represent random variation |
other possible null hypotheses not tested | not possible to check for consistency with known biology |
null hypotheses formed with 2 of the 3 sample types | loss of power |
data selected for differential expression is clustered | formation of distinct clusters is meaningless |
genes tested for consistency with third sample group restricted to genes pre-selected as different between original two groups | flaws in implementation of first hypothesis test become propagated and amplified in second hypothesis test |
no RT-PCR of irreversible genes | no validation of irreversible gene expression hypothesis |
evidence for GSK3B as an irreversible gene is weak or supports reversible hypothesis | selection of GSK3B for further experimentation is not indicated |
tags per million (TPM) used in statistical testing rather than for reporting purposes only | artificially inflates non-zero counts |
some SAGE tags incorrectly mapped | a) follow-up RT-PCR is not validation, b) evidence for involvement of COX2 pathway is weaker than implied |