We have attempted to address the fundamental issue of "Can the normal physiological states of various human tissue types be quantified at the molecular level faithfully and succinctly?" In the biomedical literature, the phrase "normal physiological state" is often brought up to contrast the phrase "pathological or disease state". In physics or engineering, a "state" of a system must be quantified by well-defined variables. Can we do the same in the biological world? We conceptualized the issue by arguing that one way to describe a biological state at the molecular level is to present a template consisting of (a) a list of molecule species and (b) their relative abundance levels. To be useful, three properties should be possessed- compactness, repeatability and discrimination ability. The list should be reasonably short and the template should be able to predict the state accurately for as many sets of data generated by as many different labs as possible. Taking full advantage of the rich data resource provided by GEO (Gene Expression Omnibus), here we offered the characterization of normal physiological state a bench mark solution.
This report is the first to present a multi-purposed, molecule-based molecular model that can characterize as many as 24 different human tissue types. The success of our tissue-specific GETs in accurately predicting the tissue types from various sources and in discriminating tissues/cells at different developmental stages indicates that (A) a tissue under the disease-free condition constantly maintains certain stoichiometry among many gene products; (B) the same tissue type from different disease-free individuals shares very similar gene-product stoichiometry; (C) the gene-product stoichiometry can be expressed as the relative transcription levels of a set of representative genes, a gene expression template (GET) (the combinatorial expression levels of the 56 genes in this study); (D) When the physiological or developmental state of a cell shifts, the gene-product stoichiometry may change accordingly. (E)
Severe alteration from the normal state gene-product stoichiometry, possibly caused by multiple mutations in genes or dramatic shifts of the overall biochemical environment of a cell, may lead to abnormal growth like cancer, if not death of a cell. In support of this notion, we also demonstrated that the 56-gene expression patterns in cancerous cells/tissues significantly deviate from normal GET and that our tissue-specific GETs can be used to discern melanoma from benign nevi and from normal skin. Potential applications of our results to tissue engineering, cancer diagnosis and development studies are therefore inferred.
Our approach to constructing a gene signature for predicting tissue types is simpler than existing classification methods . We first identified those genes showing a similar and reproducible trend in all three large datasets, then used the full gene group to perform tissue classification, and finally applied the group behavior (that is, the expression profiles of the compact 56-gene group) as predictors to characterize tissue types under various conditions. Without complicated modeling, our 56-gene signature provides high prediction power on numerous public datasets. As far as we are aware of, this is the most compact gene set capable of classifying the largest number of tissues. The use of multiple datasets which served as biological replicates allowed us to reduce the number of false positives and to find the genes with most variable expression across various tissues with better confidence. Note, however, that because of the high accuracy already achieved by the 56 genes, we did not explore the issue of possible existence of other gene sets that could serve as GETs and accomplish the same or even better rate of prediction - perhaps with aids of additional statistical tools such as one-way ANOVA for gene selection.
With the abundance of interplaying gene and pathway activities in a tissue, one may ask how the group behavior of these 56 genes can represent the states of various human tissue types. Our functional study of the 56 genes reveals a variety of functional categories including cytoskeleton (desmin, nebulin), signal transduction proteins (protein kinase C beta1, CDC28 protein kinase regulatory subunit 2), neural transmitter regulator (4-aminobutyrate aminotransferase), energy homeostasis regulator (insulin-like growth factor binding protein 1), and immunity (CD24 molecule) etc. It should be emphasized, however, that the high precision of large-scale validation on tissue prediction was not achieved through the combinatorial on/off states of a collection of tissue-specific markers because only 4 of the 56 genes appeared as tissue-specific genes which highly express in one particular tissue but minimally in others. They are TFPI2 specific for placenta, ANKRD7 for testis, ELA2A for pancreas and APOC3 for liver. However, the expressions of all 56 genes together as a template, did present distinctive patterns varying from tissue to tissue. Therefore, this gene set may be considered as the representative genes of the key biochemical pathways functioning distinctively across tissues, and the combinatorial transcription levels of the 56 genes, the GETs, may reflect the net sum of the relative activities of these pathways.
Despite that the feature of tissue-characterization of the 56 genes may not be exerted through collection of the so-called "tissue-specific" genes as discussed above, it would be interesting to find out how each of the 56 genes may contribute to tissue characterization. One of our on-going projects in reducing the gene set without compromising its power in defining the normal physiological state of a specified human tissue may help to answer this question.
The network analysis provided additional clues to the biological implications of the signature in development and carcinogenesis. Positive correlation of the 56- gene profile to developmental stages revealed in both in vitro and in vivo studies indicates that systematic shifts of the global gene expression through the complicated developmental process can be characterized with our signature genes. Hence it is possible that the 56 genes may be good candidates for modeling the human developmental process. Further, the capability of the 56- gene profiles in correlating quality of the engineered skin to the similarities to normal skin template brought up a potential application of the signature to serve as the quality index for engineered tissue.
The network analysis also helped to link our model to the current understanding of tumorigenesis. We showed that the c.f.s of the 56-gene profiles in malignant tumors were significantly lower than the normal tissues to the corresponding template, indicating changes of expression in multiple genes in a cancer tissue. It coincides with the findings that at least 4-5 mutations are required to initiate tumor [29, 30]. In our network analysis, more than half of the 56 genes were found to interact with those well-known cancer-related transcription factors or signaling receptors such as STAT3, TP53, ESR1 and EGFR which have been shown to interact with a great number of gene products involved in varieties of pathways. Therefore, it is possible that mutations occurring in such genes (i.e. EGFR, STAT3 etc.) may simultaneously affect expression of a number of the target genes which may ultimately lead to changes in the profile of our signature. Further, significant change in the profile of the 56 genes indicates alterations in relative activities of the pathways represented by these signature genes, reflecting a dramatic shift of the cellular homeostasis which may lead to cell necrosis or anomalous growth like tumorigenesis. Alternatively, accumulated mutations in the genes which affect the activities of those pathways represented by our signature may also affect the expression profile in one hand and lead to similar outcome as described above on the other hand. Taken together, despite that severe shift of the 56-gene profile from normal may not be the initial cause of many cancers, it could have the potential to serve as an indicator for the cancerous state of a cell/tissue. Whether our signature can be applied to cancer staging awaits further investigation. Nonetheless, this knowledge provides a new aspect in understanding the complex process of carcinogenesis.