Skip to main content
Figure 1 | BMC Genomics

Figure 1

From: MMP1 bimodal expression and differential response to inflammatory mediators is linked to promoter polymorphisms

Figure 1

Flow diagram of method to identify bimodally expressed transcripts from expression data. (A) Transcript abundance is quantified by microarray or RNAseq techniques. (B) On a transcript-by-transcript basis, agglomerative clustering across the dataset is carried out. The algorithm starts by assigning the same number of clusters as individuals (in this example 10 clusters were assigned since there are 10 individuals). The clusters are then progressively merged by combining the two most similar clusters, using Wards method to calculate the distance between clusters and Euclidian distance to calculate dissimilarities between the individuals. The distances between the merging clusters are recorded by the algorithm as branch "heights". The height values at either side of the dendrogram are removed to exclude transcripts that falsely appear to be bimodally expressed due to a single outlying individual. The maximum remaining branch height value (indicated by the red arrow) is identified for each transcript, which represents the greatest distance between the any two clusters of individuals, and is used a surrogate marker for the degree of bimodal expression for that particular transcript. (C) To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone, for each transcript we make a maximum likelihood estimate of the parameters of the distribution of this transcript's abundance across the population from which the individuals being studied have been drawn. We use these parameters to generate 10,000 simulated datasets, each of which is clustered as described in (B) above. (D) In the 10,000 clusters formed from the bootstrapped data sets for this transcript, we identify how commonly the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is shown graphically and is used generate an empirical p-value as an estimate of type I error rate.

Back to article page