Skip to main content
Fig. 3 | BMC Genomics

Fig. 3

From: Determining the optimal number of independent components for reproducible transcriptomic data analysis

Fig. 3

Analysis of component reproducibility in independent datasets. a Graph of reciprocal correlations showing the reproducibility of the metagenes of independent components in 6 independent breast cancer datasets. Each node here is an independent component, represented by a metagene, from an ICA decomposition with M = 100 components. Edges show only reciprocal correlations between metagenes with Pearson correlation >0.3. Triangles (on the right) show the components driven by the expression of a small group of genes (frequently, one gene). Node size reflects the rank of the component based on the stability in multiple runs of fastICA (larger nodes are more stable ones). The edge width and the color reflect the value of the correlation coefficient between two metagenes, with thicker edges showing larger correlation values. Several pseudo-cliques of highly reproducible components are annotated either by the dominating small group of genes (pseudo-cliques of triangle nodes), or by comparing to the results of the previously published large-scale ICA-based analysis of gene expression [3] or by performing the hypergeometric test using the set of top-contributing genes (with projection larger than 5.0 onto the component). The analogous correlation graph computed for MSTD number of components is provided in Additional file 7: Figure SF3. b average reproducibility score (sum of reciprocal correlation coefficients for an independent component) for the correlation graph shown in a), as a function of the relative (component rank minus MSTD value for a given dataset, for stability-based ranking) or absolute (for other ranking types) component rank. It is clear that only stability-based ranking matches the reproducibility score

Back to article page