Skip to main content
Fig. 2 | BMC Genomics

Fig. 2

From: Genomic data integration tutorial, a plant case study

Fig. 2

Case study of genomic (expression and methylation) data integration from 10 poplar populations. A Genomic data matrix with 42 950 poplar genes in lines and 70 associated variables in columns (expression and methylation for 10 populations, color code in the legend at the left). Omics variables are gene expression and DNA methylation data produced for 10 populations of poplars, as presented at the bottom left legend of the figure. Methylation data were produced for 3 contexts of methylation (CG, CHG and CHH) on two gene features (gene-body or promoter). B Correlation matrix of the 60 methylomics and 10 transcriptomics log-transformed variables. This figure represents Spearman’s correlation between each pair of omics variables. A high positive correlation between variables is represented by a deep blue point, a high negative correlation by a deep red point. No point means no correlation between variables. On the diagonal, correlations are by definition maximum and equals to one (i.e. correlated to themselves). The matrix’ variables are arranged (see color code in the legend at the right) using a hierarchical clustering with AOE (angular order of the eigenvectors) order. C Loading plot of omics log-transformed, centered and scaled variables on the two first components of the PCA. Omics variable are plotted on PCA’s two first principal components. For each component, the percentage of initial variance explained by this component is indicated (see color code in the legend at the left). D cimDiablo_v2’s result on 'non-denoised' data. Left panel: Heatmap of omics integration. Each row corresponds to one gene and each column to one omics variable. Data were centered and scaled, then a cutoff was applied in [-2,2]. According to the heatmap’s color code, blue corresponds to very low and red to very high methylated/expressed genes. Rows and columns’ dendrograms are computed by hierarchical clusterings with the Euclidean distance and Ward method to cluster together genes and omics variables sharing similar insights. Right panel: Boxplots of k cluster groups. Using the rows dendrogram, genes were divided into four groups. For each group, the average value by population for each omics variable (methylation and gene expression) is represented. E cimDiablo_v2’s result on 'denoised' data. Data were first centered and scaled, then 'denoised', centered and scaled a second time, before a final cutoff in [-2, 2]. F Comparison between 'non-denoised' and 'denoised' data for gene expression. Top panel: Boxplot of gene expression before and after the 'denoising' step. Red for 'non-denoised' data and blue for 'denoised' data. Bottom panel: MA-plot (Bland–Altman plot, where M represents the log ratio and A the mean average) of gene expression between 'denoised' and 'non-denoised' data for one of the poplar population (Adour). The x axis represents the average expression level while the y axis the log2 fold changes. Red for significant differences above |1| and black for no obvious differences. G Extraction of genes with extreme values (candidates) for all omics variables before and after the 'denoising' step. Left panel: Venn diagram of extracted genes before and after 'denoising'. Right panel: Heatmaps of genes with extreme values for 'non-denoised' and 'denoised' data. Gene lists are plotted using hierarchical clustering with Euclidean distance and Ward method. H Illustration of the Gene ontology enrichment analysis for genes with extreme values showing low expression and high methylation levels (143) after the 'denoising' step. Gene ontology enrichment has been performed using PlantGenIe (https://plantgenie.org) with Populus trichocarpa v3.1 as background.

Back to article page