Skip to main content

Table 1 A hypothetical EST count table demonstrating CMH analysis and also a contrived example of Simpson's paradox.

From: Meta-analytical biomarker search of EST expression data reveals three differentially expressed candidates

  Tissue I Tissue II Pooled
  Normal Cancer Normal Cancer Normal Cancer
Gene A 280 580 20 20 300 600
Other genes 20,000 80,000 380,000 620,000 400,000 700,000
  1. This hypothetical case serves both as an example of how Cochran-Mantel-Haenszel (CMH) is applied as well as the occurrence of Simpson's paradox. Gene A is the gene under investigation. Expressions from all other genes are pooled into the "other genes" row. Bold typeface indicates columns showing higher cancer vs. normal propensities. CMH is applied on the stratified tissue columns (but not on the pooled data). A casual observation involving only the pooled data would suggest Gene A as having higher expression in cancer (X2 test p-value close to 0 when analyzing only the pooled). However, a closer inspection on each of the tissue columns reveals otherwise. The observed difference between cancer and normal of the "other genes" is theoretically mostly due to sampling bias.