Skip to main content

Table 1 A hypothetical EST count table demonstrating CMH analysis and also a contrived example of Simpson's paradox.

From: Meta-analytical biomarker search of EST expression data reveals three differentially expressed candidates

 

Tissue I

Tissue II

Pooled

 

Normal

Cancer

Normal

Cancer

Normal

Cancer

Gene A

280

580

20

20

300

600

Other genes

20,000

80,000

380,000

620,000

400,000

700,000

  1. This hypothetical case serves both as an example of how Cochran-Mantel-Haenszel (CMH) is applied as well as the occurrence of Simpson's paradox. Gene A is the gene under investigation. Expressions from all other genes are pooled into the "other genes" row. Bold typeface indicates columns showing higher cancer vs. normal propensities. CMH is applied on the stratified tissue columns (but not on the pooled data). A casual observation involving only the pooled data would suggest Gene A as having higher expression in cancer (X2 test p-value close to 0 when analyzing only the pooled). However, a closer inspection on each of the tissue columns reveals otherwise. The observed difference between cancer and normal of the "other genes" is theoretically mostly due to sampling bias.