(A) Cluster analysis of current, former and never smokers: Single link hierarchical clustering using the 609 SAGE tags comprised in Additional file 5 representing tags differentially expressed between current and never smokers. Distance measure used was a Euclidean distance. The visualization package Genesis  was used for clustering. Green rectangles represent samples with lower expression for the particular gene amongst the samples, and red rectangles represent samples where the gene is highly expressed relative to other samples. (B) Principal component analysis of current, former and never smokers. Expression values used were scaled to tags per million (TPM). Each tag was then normalized by dividing its value by the maximum value for that tag seen in all the libraries. Subsequently, this value was then multiplied by 6 and then subtracted by 3 to put the values ratios in the range of -3 to 3. A co-variance based approach was used and the statistics toolbox in MatLab (Mathworks) was used. Current smokers are represented in red, former smokers are represented in blue and never smokers are represented in green.