Large-scale genome characterisations, through the analysis of gene sequence and expression data, are gaining increasing interest and have the potential to greatly improve our understanding of the physiological and molecular mechanisms underlying disease pathogenesis and progression. Various models of data stratification and identification of patient groups through various data mining protocols are used to support a decision making process in biomedicine. Data mining procedures based on algorithms such as support vector machines (SVM), neural networks, decision tree algorithms and mathematical programming approaches have been used to select sets of genes for diagnostic purposes and to identify molecular roles which are - as yet- unknown . Here we have illustrated the use of random forest to partition psoriatic tissues in appropriate disease groups and generate estimates of relevant gene predictors.
Psoriasis is a common, complex immuno-genetic inflammatory disease of primarily the skin. The underlying genetics of the disease are complex with numerous implicated susceptibility genes, where replication of single loci has been confirmed for only a handful of these genes. Patients suffering from psoriasis can exhibit a host of different clinical phenotypes and response to therapy is varied and unpredictable, even within a similar clinical phenotype, suggesting underlying transcriptional differences between and within the clinical groups. The ability to investigate the underlying immuno-genomic components of these clinical sub-phenotypes has not been a possibility, until now. Identification of different transcriptional signatures and their associated molecular pathways contribute toward defining a set of biomarkers, which could serve as diagnostic and therapeutic responder tools. We have outlined a computational strategy to identify molecular sub-types and corresponding putative biomarkers that may be crucial in the understanding and prediction of disease pathogenesis. Of the 206 common differentially expressed genes identified between normal, psoriatic lesional and psoriatic non-lesional groups, 130 genes (63.1%) were up-regulated and 76 genes (36.9%) were down-regulated. Dysregulated genes discovered in this study were involved in epidermal cell modulation, cell cycling and immune responses.
Microarray analysis of gene expression has been widely used to differentiate lesional and non-lesional skin of psoriatic patients [38, 39]. Recently, large-scale analysis using whole genome array platforms on numerous patients per sample group have been undertaken with the aim of identifying gene expression profiles associated with a specific psoriatic phenotype [5, 6, 10, 40]. In this work, we present a method for identifying sub-phenotypes of lesional skin from psoriasis patients based on patterns of gene expression that characterise each group and differ significantly from normal human skin. This approach is based on a decision tree analysis of gene expression data, the extraction of associations among gene expression patterns and the identification of functional annotations and molecular signatures.
The random forest decision tree model was applied to lesional skin group to derive patient sub-groups (PP01 and PP02), which are characterised by specific differentially expressed genes. The PP01 group was defined by the up-regulation of HLA-E, which is the inhibitory ligand for innate NK cells. HLA-E takes part in processing and presenting antigen to innate immune cells. The PP02 group had more up-regulated genes related to the cells of the adaptive immune system such as CTLA-4 (associated with modulation of T helper responses), IFI30 (involved in MHC Class II antigen processing), IL4IL (immunomodulatory enzyme produced by dendritic cells), PTPN2 (associated with autoimmune disorders such as type 1 diabetes mellitus and Crohn’s disease) and most interestingly SERPINB8, which has been identified through Genome-Wide Association Studies (GWAS) as a new psoriasis susceptibility locus in the Chinese population .
With regards to mechanistic details on the pathways that operate in psoriatic sub-groups, the ErbB signaling pathway has been identified for subgroup PP01 (Table 2). This pathway consists of a family of four related receptor tyrosine kinases (ErbB1-4) which, when activated trigger many different signal transduction pathways leading to increased proliferation, survival, motility, and invasiveness . All of these responses are important aspects of wound healing and psoriasis has many elements in common with wound healing. The main clinical feature of psoriasis relates to the thickened epidermis as a result of what may initially have been an epidermal barrier insult. An attempt to restore epidermal integrity is reflected in the activation of the ErbB signaling pathway. However in psoriasis it is possible that this pathway, along with other signaling pathways is dysregulated .
Other signaling pathways seem to be in effect in psoriasis sub-group PP02 (Table 3), for example signaling by BMP. Bone morphogenetic proteins (BMP) are members of the transforming growth factor-beta (TGF beta) superfamily and regulate a large variety of biological responses in different cells and tissues. It has been reported that BMPs are implicated in a variety of pathobiologic processes in skin, including wound healing, psoriasis, and carcinogenesis .
In our analysis, when several patient clinical variables were compared across the two classes (PP01 and PP02), we found age to be significantly altered in these subgroups, indicating that this is an important factor in disease manifestations. It is worth noting that although the differences in PP01 and PP02 groups are quite marked on a transcriptional level, yet they are clinically difficult to distinguish. This observation may help explain why some patients have a different disease course to others and why some respond better to therapy than others within a given clinical phenotype. The ability to generate molecular sub-types provides putative biomarkers, which with further refinement and replication, could prove to be useful in predicting disease severity, progression and response to therapy in an individualised manner.
Random forest has become a popular tool for analysing high-throughput genomic data. Due to the large number of variables associated with characterisation of clinical samples through gene expression measurements, reduction of dimensionality through feature selection or prioritisation is critical in disease property prediction. Here, we use random forest for (i) disease classification through gene expression patterns and analysis of variable importance to generate potential disease biomarkers, and (ii) clustering of gene expression measurements to derive disease subgroups. Despite some limitation in reproducibility across different psoriasis datasets, we believe that through our study there is an emerging picture of important gene predictors in psoriasis, as well as differentiation of disease in patient subgroups. Future work based on richer datasets that profile larger patient cohorts, with stringent clinical phenotyping, will have the potential to draw clearer conclusions about this complex autoimmune skin disease.
In this study, we generated biologically meaningful phenotypic classes using a ‘core’ of the highest differentially expressed genes and then further addressing the molecular variations among the groups responsible for lesional psoriasis. This might uncover subtle differences in disease pathogenesis allowing the emergence of new treatments for psoriatic individuals and further facilitate the development of personalized treatments for the disease. To the best of our knowledge, this is the first analysis identifying substantial phenotypic groups in psoriasis, based on patient gene expression profiles and using a classification pipeline. Further analysis and discovery of patterns and associations of transcripts of different cell-types (such as T-cells, dendritic cells, keratinocytes) must be done to shed light on the contribution that different cell types make towards the pathogenesis of psoriasis. We would then gain a better insight into this unique skin disease and hopefully, resolve some of the outstanding issues related to its pathogenesis and treatment.