Bioinformatic identification and characterization of human endothelial cell-restricted genes
© Bhasin et al. 2010
Received: 25 January 2010
Accepted: 28 May 2010
Published: 28 May 2010
Skip to main content
© Bhasin et al. 2010
Received: 25 January 2010
Accepted: 28 May 2010
Published: 28 May 2010
In this study, we used a systematic bioinformatics analysis approach to elucidate genes that exhibit an endothelial cell (EC) restricted expression pattern, and began to define their regulation, tissue distribution, and potential biological role.
Using a high throughput microarray platform, a primary set of 1,191 transcripts that are enriched in different primary ECs compared to non-ECs was identified (LCB >3, FDR <2%). Further refinement of this initial subset of transcripts, using published data, yielded 152 transcripts (representing 109 genes) with different degrees of EC-specificity. Several interesting patterns emerged among these genes: some were expressed in all ECs and several were restricted to microvascular ECs. Pathway analysis and gene ontology demonstrated that several of the identified genes are known to be involved in vasculature development, angiogenesis, and endothelial function (P < 0.01). These genes are enriched in cardiovascular diseases, hemorrhage and ischemia gene sets (P < 0.001). Most of the identified genes are ubiquitously expressed in many different tissues. Analysis of the proximal promoter revealed the enrichment of conserved binding sites for 26 different transcription factors and analysis of the untranslated regions suggests that a subset of the EC-restricted genes are targets of 15 microRNAs. While many of the identified genes are known for their regulatory role in ECs, we have also identified several novel EC-restricted genes, the function of which have yet to be fully defined.
The study provides an initial catalogue of EC-restricted genes most of which are ubiquitously expressed in different endothelial cells.
The endothelium, which lines the inner surface of all blood vessels, participates in several normal physiological functions including control of vasomotor tone, the maintenance of blood fluidity, regulation of permeability, formation of new blood vessels, and trafficking of cells . The endothelium also plays an important role in several human diseases. In the setting of inflammation several genes become activated within the endothelium to facilitate the recruitment, attachment, and transmigration of inflammatory cells. Over time, however, in chronic inflammatory diseases EC responses become impaired, leading to EC dysfunction.
As a cell type, ECs exhibit tremendous heterogeneity . For example, there are significant differences in EC structure and function based on the size and type of blood vessel, from larger arteries or veins, to medium sized arterioles or venules, down to capillary ECs. There is also significant heterogeneity at the level of a particular tissue or organ. For example, in the brain, the endothelium plays a particularly important protective role as part of the blood brain barrier with ECs that are closely attached to one another forming a tight barrier that is impermeable to the passage of even small solutes or ions. In contrast, in the liver, the sinusoidal ECs are fenestrated so that small to moderate size transcellular pores promote the uptake of large lipid containing particles from the blood [3, 4].
The endothelium is known to play an important role in several human diseases including atherosclerosis, diabetes mellitus, and sepsis. The overall goal of the current study was to use primary and publicly available microarray data from human ECs, non ECs, and tissues, to identify genes that exhibit an EC-restricted pattern, define their distribution in different tissues, and determine whether changes in the expression of any of the genes are linked to particular diseases. Our study, has for the first time, identified and ranked a significant number of genes that exhibit an EC-restricted expression pattern. Among these genes, several interesting patterns of expression emerge. Whereas many of the genes are expressed in all ECs, some are restricted to microvascular ECs. The vast majority of EC-restricted genes are expressed in multiple tissues. The EC-restricted genes were found to be associated with a number of different cellular functions including vasculature development, cell differentiation, and angiogenesis. Analysis of the regulatory regions of the EC-restricted genes demonstrated enrichment of binding sites for a selected number of transcription factors and microRNAs.
HUVEC (human umbilical vein EC cell; Lonza), HAEC (human aortic EC cells), HCAEC (human coronary artery EC cells), HPAEC (human pulmonary artery EC cells), and HMVEC (human microvascular (dermal) EC cell; kindly provided by Dr. William Aird) were grown in EBM-2 (EC Cell Basal Medium-2) supplemented with EGM SingleQuots (Lonza). HASMC (human aortic smooth muscle cell) were grown in SmBM Basal Medium supplemented with SmGM-2 SingleQuot (Lonza). For the isolation of the T and B cells, discarded leukocytes from platelet donations by healthy human donors were used in this study. Samples were obtained from subjects after informed consent was obtained using an institutionally approved protocol (IRB protocol 2005-P-001364/2). Red blood cells were removed using Ficoll-Paque PLUS according to manufacturer's protocol. (GE-Healthcare. Uppsala Sweden). Donor Peripheral Blood Mononuclear Cells (PBMC) were stained with pan T-cell specific CD3-PE and pan B-cell specific CD20-FITC antibodies. Fluorescently labeled cells were sorted using a high speed cell sorter. (FACS Aria. BD biosciences San Jose. California).
Total RNA was isolated using the RNAeasy kit (QIAGEN) following the manufacturer's instructions.
Transcriptional profiling of endothelial and non-EC cells was performed using the Affymetrix oligonucleotide microarray HT U133 plate with 24 chips according to a standard Affymetrix protocol for cDNA synthesis, in vitro transcription, production of biotin-labeled cRNA, hybridization of cRNA with HT Plate A and B, and scanning of image output files . The quality of hybridized chips was assessed using Affymetrix guidelines on the basis of average background, scaling factor, number of genes called present, 3' to 5' ratios for beta-actin and GAPDH and values for spike-in control transcripts . We also checked for reproducibility of the samples by using chip to chip correlation and signal-to-noise ratio (SNR) methods for replicate arrays. All the high quality arrays were included for low and high level bioinformatics analysis. Primary gene expression data are publicly available at GEO http://www.ncbi.nlm.nih.gov/geo/ in GSE21212.
To obtain the signal values, high quality chips were further analyzed by dChip, as it is more robust than MAS5.0 and RMA in signal calculation. The raw probe level data was normalized using smoothing-spline invariant set method. The signal value for each transcript was summarized using PM-only based signal modeling algorithm described in dChip. The PM only based modeling based algorithm yields less number of false positives as compared to the PM-MM model. In this way, the signal value corresponds to the absolute level of expression of a transcript. These normalized and modeled signal values for each transcript were used for further high level bioinformatics analysis. During the calculation of model based expression signal values, array and probe outliers are interrogated and image spikes are treated as signal outliers.
When comparing two groups of samples to identify genes enriched in a given phenotype, if the 90% lower confidence bound (LCB) of the fold change (FC) between the two groups was above 3 and median false discovery rate is <2%, the corresponding gene was considered to be differentially expressed . LCB is a stringent estimate of FC and has been shown to be the better ranking statistic . It has been suggested that a criterion of selecting genes that have an LCB above 2.0 most likely corresponds to genes with an "actual" fold change of at least 3 in gene expression [8, 10].
The list of differentially expressed genes obtained from the primary analysis (previous section) was further analyzed through a series of steps to obtain EC-restricted genes. This analysis was performed using the following three steps, i); determination of the enrichment score, ii); performing an outlier analysis, and iii); ranking the genes according to EC specificity.
where ECSj is the enrichment score for a transcript j, Ai and Pi are the present and absent calls for the transcript in different normal primary cells (n).
The outlier analysis was performed on the list of genes obtained after step i) for the selection of genes with restricted EC expression. The outlier analysis was performed by means and standard deviation of the expression values using publicly available microarray data. If the expression of a given transcript in a sample falls 2 standard deviations outside of the mean expression in the distribution obtained using all samples, the particular sample is considered as an outlier. If the cluster of the outliers consists only of ECs, the genes were considered as good candidates for being EC-restricted. On the contrary, if the cluster of the outliers consists of ECs and non-ECs, these genes were considered to have less specificity for ECs and were filtered out from the final analysis.
where REF_FOLD = (Expression in ECs in public set/Expression in Non-EC) and FC = (Expression in ECs in primary set/Expression in Non-EC).
To further reduce the false positive rate, we have selected the top 60% of the transcripts with greater than 3 fold expression in ECs compared to non-ECs as good candidates for endothelial restriction.
The functional analysis of the EC-restricted genes was performed in terms of canonical pathways, disease sets and gene ontology (GO) categories. The canonical pathways and disease set enrichment analysis was performed using the MetaCore tool of GeneGo package http://www.genego.com/. It consists of manually curated information about gene regulation, protein interactions, and metabolic and signaling pathways. The overrepresented canonical pathways and disease biomarker sets were ranked on the basis of P values obtained using the Simes procedure accounting for multiple hypothesis testing representing the probability of mapping arising by chance, based on the number of EC-restricted genes identified in a particular canonical pathway or disease compared to the total number of genes in the GO category/Disease set. The Go categories/Disease set with a False Discovery Rate (FDR) corrected P value <0.05 were considered significant.
The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to identify over-represented gene ontology categories form the endothelial restricted genes . DAVID is an online implementation of the EASE software that produces the list of overrepresented categories using jackknife iterative resampling of the Fisher exact probabilities. A score was assigned to each category by using "-log" of EASE score to show the significantly enriched gene ontology categories. The related gene ontology categories were merged into a cluster using the functional clustering module of DAVID. Higher enrichment scores for particular genes reflect increasing confidence in their over-representation.
Recent improvements in bioinformatics methods for the analysis of sequences regulating transcription have made it possible to elucidate potential factors involved in regulating key regulatory networks underlying a transcriptional response. We divided the EC specific genes into two sets on the basis of K Mean clustering for promoter analysis i) high expression in all ECs ii) and high expression in HMVEC. The promoter analysis was performed separately on these two sets using the online tool ExPlain http://explain.biobase-international.com/cgi-bin/biobase/ExPlain_2.4.2/ for detection of over-represented transcription factor binding sites. ExPlain uses the MatchTM, a weight matrix-based tool for searching putative transcription factor binding sites [12, 13].
For the analysis, we selected regions from 2000 bp upstream to 100 bp downstream of the transcription start site of each gene (Yes set). The enrichment was obtained against a random set of promoters obtained from human housekeeping genes (No set). The entire vertebrate non-redundant set of transcription factors matrix from transfac database was used for scanning potential binding sites . The matrices that did not differ much in density between the positive and negative set were removed from the results. A significant over-representation of a transcription factor binding site in a target set as compared to the background set was determined using a 1-tailed Fisher exact probability test [P value < 0.01,FC (yes_set/no_set) > 1.2). After completion of the enrichment analysis, the transcription factor binding sites for each set were compared with each other, in order to identify TF binding sites that were common and distinct among the different types of ECs (e.g. all, microvascular).
Another potential mechanism of regulating EC specific genes could be through miRNA, a class of small non-coding RNAs, that regulate gene expression primarily through post-transcriptional repression by promoting mRNA degradation in a sequence-specific manner . We were interested in identifying whether miRNA binding sites are enriched in EC-restricted genes. Computational analysis of the miRNA targets sites was performed using C omposite R egulatory S ignature D atabase (CRSD) http://126.96.36.199:8080/crsd/main/home.jsp, a comprehensive server for composite regulatory signature discovery. CRSD has a package for prediction of miRNA binding sites by searching the UTRs for segments of perfect Watson-Crick in the 3'UTR of the target gene set .The miRNA binding sites for each of the micro RNA are calculated in the EC-restricted set and the background set (54,576 genes from human unigenes). The enrichment of each miRNA binding site is calculated on the basis of its abundance in the EC-restricted set and the background set. The significance of enrichment is expressed as a P value (smaller the P value more significant is the enrichment).
In order to determine the normal tissue distribution of the EC specific genes, we obtained the normalized expression level from the Stanford Source database . Source database presents the relative expression level of a gene in different tissues that is normalized for the number of samples from each tissue included in UniGene. The gene expression information for the different transcripts was obtained from dbEST expression profile.
In addition to relative gene expression information from the Source database, we have also manually curated the protein expression information about the endothelial specific genes from the Human Protein Atlas database. The Human Protein Atlas is a comprehensive database that provides the protein expression profiles for a large number of human proteins, presented as immunohistological images from most human tissues [18, 19]. It contains antibody-based protein expression and localization profiles of >4,000 proteins in 48 normal human tissues and 20 different cancers . The expression level of each protein is presented in a four color scale system that takes into consideration the intensity of the protein expression and quantity of positive images tested for each protein. It is a very useful tool to extract the relative expression level of proteins in different tissues.
Total RNA was isolated using the RNAeasy kit (QIAGEN, Valencia, CA). Single stranded cDNA was synthesized from total RNA using High Capacity RNA-to-DNA Kit (Applied Biosystems). SYBR Green I-based real-time PCR was carried out on an Opticon Monitor. The sequences of the primers used in this study are listed in Additional File 1. For normalization of each sample, human specific TATA-binding protein (TBP) primers were used to measure the amount of TBP cDNA.
Comparing groups, we found 1,713 transcripts that are differentially expressed in HMVEC compared to non-ECs (LCB > 3 and FDR < 2%). Similarly for HUVEC and HPVEC, 1,534 and 1,539 transcripts were respectively differentially expressed compared to non-ECs. For the arterial EC cells, 1,239 HCAEC and 1,316 HAEC transcripts were determined to be differentially expressed in these cells compared to non-ECs. Comparison of the differentially expressed transcripts in microvascular (HMVEC), venous (HUVEC, HPVEC) and arterial (HAEC, and HCAEC) cells using Venn diagrams revealed that approximately half of the transcripts are differentially expressed in all three EC types. However we also observed that each EC type possessed a unique expression signature; the differential expression of transcripts was limited to one type of EC [Figure 1B].
List of the endothelial restricted genes with detailed annotation and rank score
In order to evaluate whether the EC-restricted genes are potentially linked to the pathogenesis of certain human diseases, we performed a disease set enrichment analysis using disease sets on the basis of published literature (DSPL). DSPL enrichment analysis was performed using the MetaCore tool in the GeneGO package. The disease associations are summarized in Figure 5C, depicting the top diseases in which EC-restricted are enriched. The EC-restricted genes are enriched in the many cardiovascular diseases including ventricular dysfunction, myocardial infarction, hypertension, diabetic angiopathies, arteriosclerosis, and several other vascular diseases. Interestingly, ischemia was listed as a disease in which the EC-restricted are over-represented (P value = 2E-06). The EC-restricted genes are also enriched (P value < 0.01) in neurological diseases including subarachnoid hemorrhage (P value = 3.00E-07).
List of significantly enriched miRNAs binding sites.
PALMD,RAPGEF5,LOC90139,MYLK2, CETP,TIE1,GLCE,VWF,ROBO4,KIAA1274, PDE2A,
Normalized Expression Level of top endothelial restricted genes obtained from the Source database.
Umbilical cord (16.7%)
Umbilical cord (43.9%)
Umbilical cord (10.1%),
Umbilical cord (16.3%),
Umbilical cord (16.0%),
Umbilical cord (82.4%)
Umbilical cord (44.6%)
To further explore whether any of the EC-restricted genes have specific expression in particular tissues, we obtained the immunohistochemistry data for 61 out of the 109 EC-restricted genes. The majority of the EC-restricted genes demonstrate a ubiquitous expression in different normal tissues (Additional File 3). A small subset of the genes show a restricted expression pattern in normal tissues. For example, VWF and ICAM2 are enriched in soft tissues. BMX, one of the top ranked endothelial restricted genes has preferential expression in the epididymis. CLDN5 is preferentially expressed in glandular cells of various body tissues. Interestingly, about 85% of genes depict moderate to high levels of expression in soft tissues.
The results of our study demonstrate that of over 43,000 transcripts evaluated, only 152 appear to be highly restricted to the endothelium. Several of the genes identified have previously been reported to exhibit an EC-restricted expression pattern and have known functions in ECs. Examples of these genes include angiopoietin-2, von Willebrand's Factor (vWF), EC nitric oxide synthase (eNOS), and Pecam-1 (CD31). The pathways, and GO categories of the identified genes support a role for these genes in vascular development, angiogenesis, and EC function.
Although several of the EC-restricted genes have previously been shown to contribute to the regulation of normal EC function, many others have not been characterized as having a particular role in EC. The genes identified as being EC-restricted fall into several categories, including proteins involved in transcriptional regulation, cell adhesion, signal transduction, and intracellular trafficking. The determination that these genes are enriched in ECs may lead to future studies that define their specific role in regulating EC function.
The endothelium is known to play an important role in a number of human diseases, and so it was not a surprise that alterations in the expression of these genes are associated with a number of cardiovascular disorders. Mutations or alterations in the expression of several of the genes listed have been shown to be associated with the development of hypertension. For example, mutations in the eNOS gene have been linked to patients with essential hypertension [21–23]. Similar associations have been observed with mutations in the endothelin-1 gene [24, 25]. More recent studies point toward a link between obesity and hypertension. There has been particular interest at understanding the role of adipocytokines and their receptors in the development of hypertension. Previous studies have suggested a causal link between leptin levels in obese patients and the development of hypertension . A more recently discovered adipocytokine, apelin, is predominantly expressed in the ECs of the heart and support a role for apelin in the development of hypertension and cardiac hypertrophy .
The endothelium is known to play an important paracrine role with respect to cardiac function and development. The TGFbeta family member cytokine, bone morphogenetic protein-4 (BMP-4), is known to play an important role during cardiac development . Increased expression of BMP-4 may similarly be reflective of a state of EC dysfunction. Exposure of ECs to BMP-4 promotes ROS generation . BMP-4 expression is increased in EC exposed to abnormal or unstable flow, compared to regions of laminar shear flow . Venous and microvessel ECs exposed to BMP-4 rapidly undergo apoptosis . These results suggest the possibility that BMP-4 could be a possible therapeutic target in the setting of heart failure to improve or reverse EC dysfunction.
The functional and structural integrity of the central nervous system depends on tightly controlled coupling between neural activity and cerebral blood flow. This requires the close interaction of neuronal cells and vascular cells in a complex that is known as the neurovascular unit. Recent experimental evidence suggest that dysfunction of the neurovascular unit may be an early event in Alzheimer's disease. Studies in transgenic mice overexpressing the amyloid precursor protein (APP) exhibit abnormalities in blood flow in response to functional hyperemia prior to the development of amyloid plaques or vascular amyloid . Administration of soluble amyloid beta protein results in vasoconstriction, EC dysfunction and a reduction in CBF. One of the main mechanisms by which EC dysfunction occurs is through inactivation or reduced function of EC nitric oxide synthase (eNOS). Amyloid beta also induces the production of reactive oxygen species, alteration in the expression of tight junction proteins, and an increased rate of EC apoptosis . In the brain tissue samples of patients with AD, we observed a significant increase in the expression of selected adherens and tight junction proteins including VE-cadherin, claudin-5, and connexin 37 (GJA4). Systemic administration of the amyloid beta peptide 1-42 to rats is associated with alterations in the expression and cellular localization of several tight junction proteins . Another EC-restricted gene found to be significantly upregulated in the AD brain tissue samples is von Willebrand's Factor (vWF). Increased levels of vWF promote blood clotting. Increased vWF has been found in heme-rich deposits (HRDs) in patients with dementia . HRDs are also rich in fibrinogen, collagen IV, and red blood cells, and are thought to be the residua of capillary bleeds, or microhemorrhages. In patients with acute ischemic stroke and vascular dementia, vWF levels have also been shown to be increased .
Our analysis of potential transcription factors that might be involved in regulating the expression of the identified EC-restricted genes, based on conserved binding sites in the regulatory regions of these genes led to the identification of several classes of transcription factors. Most of these transcription factors have not previously been described as playing a major role in the regulation of EC-restricted genes with some exceptions. Members of the ETS and GATA transcription factor families have been shown to regulate a number of endothelial genes including vWF, VE-cadherin, and Tie1 [36–38]. Interestingly, several conserved binding sites were identified only in the regulatory regions of the microvascular ECs suggesting that members of these transcription factor families may play a unique role in determining endothelial gene expression in microvessels.
Over the past several years a role for microRNAs has been demonstrated to play a role in regulating EC gene expression, function, and in the process of angiogenesis. Although most of the miRNAs we identified have not been described for their roles in regulating EC-restricted genes, a few have. For example, hsa-miR-296 has recently been shown to play a regulatory role in angiogenesis (39). Angiogenic factors can increase the expression of hsa-miR-296. Down regulation of hsa-miR-296 in ECs inhibits angiogenic responses in cultured ECs. Furthermore, inhibition of hsa-miR-296 with antagomirs reduced angiogenesis in tumor xenografts in vivo. Similarly, hsa-miR-328 has been implicated in the regulation of CD44 . CD44 regulates a wide variety or processes including angiogenesis and inflammation. The fact that only a small subset of the more than 700 microRNAs has thus far been shown to regulate EC-restricted genes or play a role in regulating EC function suggests that several additional members, including those we have identified, may well also play a role in regulating the expression of selected EC-restricted genes or EC function.
We recognize that there are potential limitations of our study. First, the study used expression-profiling data based on RNA obtained from human tissues or cells. Although several of the genes identified are known to be vascular-specific, the newly identified genes will ultimately need further validation as to the true extent of their EC specificity, at the level of protein and/or RNA both in cells and tissues, and to validate their EC-restricted pattern within the identified tissues.
Our study validates the existence of a finite number of endothelial-restricted genes most of which are ubiquitously expressed. Several of these are restricted to cells of microvascular origin. Although several of the genes are known to play important roles in endothelial function, the exact functional role of many others in endothelial cells remains to be defined. We hope that our study provides an initial catalogue of EC-restricted genes that can lead to further studies that either link alterations in the expression of these genes to a variety of human diseases via their role as biomarkers or are ultimately shown to play a causal role in the pathogenesis of the particular human diseases.
This work was supported by NIH grants HL-67219 (PO) and P01 HL76540 (PO), and AHA award EIA0740012 (PO)
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.