Bioinformatic identification and characterization of human endothelial cell-restricted genes

Background In this study, we used a systematic bioinformatics analysis approach to elucidate genes that exhibit an endothelial cell (EC) restricted expression pattern, and began to define their regulation, tissue distribution, and potential biological role. Results Using a high throughput microarray platform, a primary set of 1,191 transcripts that are enriched in different primary ECs compared to non-ECs was identified (LCB >3, FDR <2%). Further refinement of this initial subset of transcripts, using published data, yielded 152 transcripts (representing 109 genes) with different degrees of EC-specificity. Several interesting patterns emerged among these genes: some were expressed in all ECs and several were restricted to microvascular ECs. Pathway analysis and gene ontology demonstrated that several of the identified genes are known to be involved in vasculature development, angiogenesis, and endothelial function (P < 0.01). These genes are enriched in cardiovascular diseases, hemorrhage and ischemia gene sets (P < 0.001). Most of the identified genes are ubiquitously expressed in many different tissues. Analysis of the proximal promoter revealed the enrichment of conserved binding sites for 26 different transcription factors and analysis of the untranslated regions suggests that a subset of the EC-restricted genes are targets of 15 microRNAs. While many of the identified genes are known for their regulatory role in ECs, we have also identified several novel EC-restricted genes, the function of which have yet to be fully defined. Conclusion The study provides an initial catalogue of EC-restricted genes most of which are ubiquitously expressed in different endothelial cells.


Background
The endothelium, which lines the inner surface of all blood vessels, participates in several normal physiological functions including control of vasomotor tone, the maintenance of blood fluidity, regulation of permeability, formation of new blood vessels, and trafficking of cells [1]. The endothelium also plays an important role in several human diseases. In the setting of inflammation several genes become activated within the endothelium to facilitate the recruitment, attachment, and transmigration of inflammatory cells. Over time, however, in chronic inflammatory diseases EC responses become impaired, leading to EC dysfunction.
As a cell type, ECs exhibit tremendous heterogeneity [2]. For example, there are significant differences in EC structure and function based on the size and type of blood vessel, from larger arteries or veins, to medium sized arterioles or venules, down to capillary ECs. There is also significant heterogeneity at the level of a particular tissue or organ. For example, in the brain, the endothelium plays a particularly important protective role as part of the blood brain barrier with ECs that are closely attached to one another forming a tight barrier that is impermeable to the passage of even small solutes or ions. In contrast, in the liver, the sinusoidal ECs are fenestrated so that small to moderate size transcellular pores promote the uptake of large lipid containing particles from the blood [3,4].
The endothelium is known to play an important role in several human diseases including atherosclerosis, diabetes mellitus, and sepsis. The overall goal of the current study was to use primary and publicly available microarray data from human ECs, non ECs, and tissues, to identify genes that exhibit an EC-restricted pattern, define their distribution in different tissues, and determine whether changes in the expression of any of the genes are linked to particular diseases. Our study, has for the first time, identified and ranked a significant number of genes that exhibit an EC-restricted expression pattern. Among these genes, several interesting patterns of expression emerge. Whereas many of the genes are expressed in all ECs, some are restricted to microvascular ECs. The vast majority of EC-restricted genes are expressed in multiple tissues. The EC-restricted genes were found to be associated with a number of different cellular functions including vasculature development, cell differentiation, and angiogenesis. Analysis of the regulatory regions of the EC-restricted genes demonstrated enrichment of binding sites for a selected number of transcription factors and microRNAs.

Methods
Cell culture HUVEC (human umbilical vein EC cell; Lonza), HAEC (human aortic EC cells), HCAEC (human coronary artery EC cells), HPAEC (human pulmonary artery EC cells), and HMVEC (human microvascular (dermal) EC cell; kindly provided by Dr. William Aird) were grown in EBM-2 (EC Cell Basal Medium-2) supplemented with EGM SingleQuots (Lonza). HASMC (human aortic smooth muscle cell) were grown in SmBM Basal Medium supplemented with SmGM-2 SingleQuot (Lonza). For the isolation of the T and B cells, discarded leukocytes from platelet donations by healthy human donors were used in this study. Samples were obtained from subjects after informed consent was obtained using an institutionally approved protocol (IRB protocol 2005-P-001364/2). Red blood cells were removed using Ficoll-Paque PLUS according to manufacturer's protocol. (GE-Healthcare. Uppsala Sweden). Donor Peripheral Blood Mononuclear Cells (PBMC) were stained with pan T-cell specific CD3-PE and pan B-cell specific CD20-FITC antibodies. Fluorescently labeled cells were sorted using a high speed cell sorter. (FACS Aria. BD biosciences San Jose. California).

RNA isolation
Total RNA was isolated using the RNAeasy kit (QIA-GEN) following the manufacturer's instructions.

Microarray Analysis
Transcriptional profiling of endothelial and non-EC cells was performed using the Affymetrix oligonucleotide microarray HT U133 plate with 24 chips according to a standard Affymetrix protocol for cDNA synthesis, in vitro transcription, production of biotin-labeled cRNA, hybridization of cRNA with HT Plate A and B, and scanning of image output files [5]. The quality of hybridized chips was assessed using Affymetrix guidelines on the basis of average background, scaling factor, number of genes called present, 3' to 5' ratios for beta-actin and GAPDH and values for spike-in control transcripts [6]. We also checked for reproducibility of the samples by using chip to chip correlation and signal-to-noise ratio (SNR) methods for replicate arrays. All the high quality arrays were included for low and high level bioinformatics analysis. Primary gene expression data are publicly available at GEO http://www.ncbi.nlm.nih.gov/geo/ in GSE21212.

Statistical Analysis
To obtain the signal values, high quality chips were further analyzed by dChip, as it is more robust than MAS5.0 and RMA in signal calculation. The raw probe level data was normalized using smoothing-spline invariant set method. The signal value for each transcript was summarized using PM-only based signal modeling algorithm described in dChip. The PM only based modeling based algorithm yields less number of false positives as compared to the PM-MM model. In this way, the signal value corresponds to the absolute level of expression of a transcript [7]. These normalized and modeled signal values for each transcript were used for further high level bioinformatics analysis. During the calculation of model based expression signal values, array and probe outliers are interrogated and image spikes are treated as signal outliers.
When comparing two groups of samples to identify genes enriched in a given phenotype, if the 90% lower confidence bound (LCB) of the fold change (FC) between the two groups was above 3 and median false discovery rate is <2%, the corresponding gene was considered to be differentially expressed [8]. LCB is a stringent estimate of FC and has been shown to be the better ranking statistic [9]. It has been suggested that a criterion of selecting genes that have an LCB above 2.0 most likely corresponds to genes with an "actual" fold change of at least 3 in gene expression [8,10].

Identification of EC-restricted genes
The list of differentially expressed genes obtained from the primary analysis (previous section) was further analyzed through a series of steps to obtain EC-restricted genes. This analysis was performed using the following three steps, i); determination of the enrichment score, ii); performing an outlier analysis, and iii); ranking the genes according to EC specificity.

i) Enrichment Score [ECS]
The enrichment analysis was performed to determine the probability that genes are specifically over expressed in ECs as compared to other primary non-ECs. For this analysis we used the public REFEXA database http:// www.lsbm.org/site_e/database/index.html. The REFEXA database consists of gene expression data from a series of primary cells, cancer cell lines, and tissues. The MAS5 normalized data was manually obtained from the database for all the transcripts that were identified as highly expressed in ECs compared to non-ECs in the primary analysis. The enrichment score of each gene was determined by calculating the relative expression in the ECs compared to non-ECs. Each transcript was assigned a present/absent call in every primary cell on the basis of expression value. The transcript is called present (P) in a primary non-endothelial cell if it was expressed >50% of the expression level in the primary ECs, otherwise it was called absent (A). The EC score (ECS) is obtained using the following equation: where ECS j is the enrichment score for a transcript j, A i and P i are the present and absent calls for the transcript in different normal primary cells (n).

ii) Outlier Analysis
The outlier analysis was performed on the list of genes obtained after step i) for the selection of genes with restricted EC expression. The outlier analysis was performed by means and standard deviation of the expression values using publicly available microarray data. If the expression of a given transcript in a sample falls 2 standard deviations outside of the mean expression in the distribution obtained using all samples, the particular sample is considered as an outlier. If the cluster of the outliers consists only of ECs, the genes were considered as good candidates for being EC-restricted. On the contrary, if the cluster of the outliers consists of ECs and non-ECs, these genes were considered to have less specificity for ECs and were filtered out from the final analysis.

iii) Ranking of EC-restricted genes
After the outlier and enrichment analysis, all the identified EC-restricted genes were ranked on the basis of average fold change of a gene in ECs as compared to non ECs (REF_FOLD) in publicly available datasets (REFEXA) and Fold change between ECs and non-ECs from our primary experiment (FC) [EQ 2]. The genes with high REF_FOLD and high FC are considered to be more EC-restricted and assigned a higher rank.

REF FOLD FC
where REF_FOLD = (Expression in ECs in public set/ Expression in Non-EC) and FC = (Expression in ECs in primary set/Expression in Non-EC).
To further reduce the false positive rate, we have selected the top 60% of the transcripts with greater than 3 fold expression in ECs compared to non-ECs as good candidates for endothelial restriction.

Pathways, Gene ontology and Disease set enrichment analysis of EC-restricted genes
The functional analysis of the EC-restricted genes was performed in terms of canonical pathways, disease sets and gene ontology (GO) categories. The canonical pathways and disease set enrichment analysis was performed using the MetaCore tool of GeneGo package http:// www.genego.com/. It consists of manually curated information about gene regulation, protein interactions, and metabolic and signaling pathways. The overrepresented canonical pathways and disease biomarker sets were ranked on the basis of P values obtained using the Simes procedure accounting for multiple hypothesis testing representing the probability of mapping arising by chance, based on the number of EC-restricted genes identified in a particular canonical pathway or disease compared to the total number of genes in the GO category/Disease set. The Go categories/Disease set with a False Discovery Rate (FDR) corrected P value <0.05 were considered significant.
The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to identify overrepresented gene ontology categories form the endothelial restricted genes [11]. DAVID is an online implementation of the EASE software that produces the list of overrepresented categories using jackknife iterative resampling of the Fisher exact probabilities. A score was assigned to each category by using "-log" of EASE score to show the significantly enriched gene ontology categories. The related gene ontology categories were merged into a cluster using the functional clustering module of DAVID. Higher enrichment scores for particular genes reflect increasing confidence in their overrepresentation.

Analysis of transcription factor binding sites
Recent improvements in bioinformatics methods for the analysis of sequences regulating transcription have made it possible to elucidate potential factors involved in regulating key regulatory networks underlying a transcriptional response. We divided the EC specific genes into two sets on the basis of K Mean clustering for promoter analysis i) high expression in all ECs ii) and high expression in HMVEC. The promoter analysis was performed separately on these two sets using the online tool ExPlain http://explain.biobase-international.com/cgi-bin/biobase/ExPlain_2.4.2/ for detection of over-represented transcription factor binding sites. ExPlain uses the MatchTM, a weight matrix-based tool for searching putative transcription factor binding sites [12,13].
For the analysis, we selected regions from 2000 bp upstream to 100 bp downstream of the transcription start site of each gene (Yes set). The enrichment was obtained against a random set of promoters obtained from human housekeeping genes (No set). The entire vertebrate non-redundant set of transcription factors matrix from transfac database was used for scanning potential binding sites [14]. The matrices that did not differ much in density between the positive and negative set were removed from the results. A significant overrepresentation of a transcription factor binding site in a target set as compared to the background set was determined using a 1-tailed Fisher exact probability test [P value < 0.01,FC (yes_set/no_set) > 1.2). After completion of the enrichment analysis, the transcription factor binding sites for each set were compared with each other, in order to identify TF binding sites that were common and distinct among the different types of ECs (e.g. all, microvascular).

MicroRNA target analysis
Another potential mechanism of regulating EC specific genes could be through miRNA, a class of small noncoding RNAs, that regulate gene expression primarily through post-transcriptional repression by promoting mRNA degradation in a sequence-specific manner [15]. We were interested in identifying whether miRNA binding sites are enriched in EC-restricted genes. Computational analysis of the miRNA targets sites was performed using Composite Regulatory Signature Database (CRSD) http://140.120.213.10:8080/crsd/main/ home.jsp, a comprehensive server for composite regulatory signature discovery. CRSD has a package for prediction of miRNA binding sites by searching the UTRs for segments of perfect Watson-Crick in the 3'UTR of the target gene set [16].The miRNA binding sites for each of the micro RNA are calculated in the ECrestricted set and the background set (54,576 genes from human unigenes). The enrichment of each miRNA binding site is calculated on the basis of its abundance in the EC-restricted set and the background set. The significance of enrichment is expressed as a P value (smaller the P value more significant is the enrichment).

Tissue specificity of EC specific Gene
In order to determine the normal tissue distribution of the EC specific genes, we obtained the normalized expression level from the Stanford Source database [17]. Source database presents the relative expression level of a gene in different tissues that is normalized for the number of samples from each tissue included in UniGene. The gene expression information for the different transcripts was obtained from dbEST expression profile.
In addition to relative gene expression information from the Source database, we have also manually curated the protein expression information about the endothelial specific genes from the Human Protein Atlas database. The Human Protein Atlas is a comprehensive database that provides the protein expression profiles for a large number of human proteins, presented as immunohistological images from most human tissues [18,19]. It contains antibody-based protein expression and localization profiles of >4,000 proteins in 48 normal human tissues and 20 different cancers [20]. The expression level of each protein is presented in a four color scale system that takes into consideration the intensity of the protein expression and quantity of positive images tested for each protein. It is a very useful tool to extract the relative expression level of proteins in different tissues.

Quantitative real-time PCR
Total RNA was isolated using the RNAeasy kit (QIAGEN, Valencia, CA). Single stranded cDNA was synthesized from total RNA using High Capacity RNAto-DNA Kit (Applied Biosystems). SYBR Green I-based real-time PCR was carried out on an Opticon Monitor. The sequences of the primers used in this study are listed in Additional File 1. For normalization of each sample, human specific TATA-binding protein (TBP) primers were used to measure the amount of TBP cDNA.

Identification of EC-restricted genes
In an effort to identify genes that exhibit an ECrestricted pattern total RNA was isolated from primary cultured ECs (including HUVEC, HPAEC, HAEC, HMVEC, and HCAEC) and non-ECs (HASMC, B cells, T cells). Gene expression profiling was performed using a high throughput platform, HT U133 plate, that measures more than 43,000 well-characterized genes and UniGene clusters. The expression profiling was performed in duplicate. All the array data was determined to be of high quality as assessed by the scaling factor, average background, percent present calls, and 3'/5'RNA ratio. After normalization and preprocessing of the data, we generated a list of genes that are significantly differentially expressed between different ECs and non-ECs. The heterogeneity in the transcription profile of the EC was identified using unsupervised clustering, reflecting the global similarities between the samples [ Figure 1A]. Unsupervised clustering demonstrated the highest similarity within the biological replicates and the least similarity between ECs and non-ECs. The cladogram produced by unsupervised clustering depicted that venous and pulmonary arterial ECs are much closer in expression profile as compared to microvascular cells.
Comparing groups, we found 1,713 transcripts that are differentially expressed in HMVEC compared to non-ECs (LCB > 3 and FDR < 2%). Similarly for HUVEC and HPVEC, 1,534 and 1,539 transcripts were respectively differentially expressed compared to non-ECs. For the arterial EC cells, 1,239 HCAEC and 1,316 HAEC transcripts were determined to be differentially expressed in these cells compared to non-ECs. Comparison of the differentially expressed transcripts in microvascular (HMVEC), venous (HUVEC, HPVEC) and arterial (HAEC, and HCAEC) cells using Venn diagrams revealed that approximately half of the transcripts are differentially expressed in all three EC types. However we also observed that each EC type possessed a unique expression signature; the differential expression of transcripts was limited to one type of EC [ Figure 1B].   The total number of transcripts that are significantly different in at least one of the EC types compared to non-ECs consists of 2553, representing 1617 genes. To further refine our initial list of EC-restricted genes, we evaluated the expression of these genes using the data from REFEXA http://www.lsbm.org/site_e/database/ index.html to identify EC-restricted genes. To calculate an enrichment score for each gene, expression values were manually obtained for each transcript using the REFEXA database http://www.lsbm.org/site_e/database/ index.html. This database has MAS5 normalized gene expression data for several primary cells, including ECs, cancer cell lines, and normal tissues. For analysis we only used the expression data for 30 primary cells and excluded all cancer cell lines. The enrichment and outlier analysis identified 289 outlier transcripts with an enrichment score of 1 (see methods for details). To further reduce the number of false positive results, the top 60% (168 transcripts) of transcripts with an average of greater than or equal to 3 fold overexpression in EC cells as compared to non-EC cells were considered EC-restricted. The expression value of these 168 transcripts was manually checked and transcripts with reduced specificity were removed. After manual inspection of relative expression profiles of each transcript, we selected 152 transcripts that correspond to 109 valid genes exhibiting an ECrestricted pattern ( Table 1). The 152 transcripts with varying EC specificity are ranked on the basis of fold change in the primary set and fold change from the external datasets (e.g. REFEXA). The Rank score is a significance level with larger rank scores indicating increasing confidence in endothelial restriction. The overall schema of curating endothelial specific genes is shown in Figure 1C. Many genes that are known to be ECrestricted, including angiopoietin-2, von Willebrand's factor (vWF), VE-cadherin (CD144) are at the top of the list ( Table 1). Comparison of the EC-restricted transcripts in microvascular (HMVEC), venous (HUVEC, HPVEC) and arterial (HAEC, and HCAEC) cells using Venn diagrams revealed that most of the transcripts are differentially expressed in all three EC cell types. Only a small fraction of transcripts are uniquely differentially expressed in microvascular ECs [ Figure 1D]. A colorogram demonstrating the expression pattern for each of the ECrestricted genes is shown in Figure 2. The colorogram consists of a range of patterns from transcripts highly expressed in all EC types (Pattern IV) to transcripts that are highly expressed in particular EC types (Pattern I). ANGPT2, TBX1, FLT4 are examples of genes that are highly expressed in the HMVEC cells. The expression patterns of EC-restricted genes were further confirmed using the REFEXA dataset [ Figure 3]. To further validate the microarrays results, we used PCR to quantitate the expression levels of 12 randomly selected EC-restricted genes in primary ECs and non-ECs. A very similar ECrestricted expression pattern was observed for all 12 genes [ Figure 4]. Although the relative fold enrichment of some of the EC-restricted genes was somewhat lower than initially identified by microarray analysis, the expression in non-ECs remained quite low or absent in comparison to ECs.

Pathways and Gene Ontology (GO) Processes modulated by EC-restricted genes
We performed an enrichment analysis of the ECrestricted genes to identify the pathways and GO processes where the EC-restricted genes occur more often than would be expected by random distribution. The pathway enrichment analysis was performed using the MetaCore tool of the GeneGO package where P values of <0.05 (FDR adjusted) are considered significant. The enrichment analysis identified a set of statistically significant enriched pathways ( Figure 5A). The most highly enriched pathways included "EC contacts by junctional/ nonjuctional mechanisms", "Regulation of eNOS activity in cardiomyocytes and endothelial cells", "thrombospondin signaling", "Role of PKA in cytoskeleton reorganization", many of which would be expected based on the identified gene list. The enrichment analysis for GO categories was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) program. The top clusters of biological processes and metabolic functions that are enriched in the set of differentially expressed genes are shown in Figure 5B. The most highly enriched clusters of the gene ontology categories included vasculature development and angiogenesis, immune responses, cell adhesion, and cell motility and migration. Vascular development and angiogenesis is the highest enriched GO cluster in which the ECrestricted genes are overrepresented (Enrichment score

4.72)
. This finding supports the overall concept that at least a subset of the genes we identified as being ECrestricted have previously been described in processes known to involve ECs.

Disease set enrichment of EC-restricted genes
In order to evaluate whether the EC-restricted genes are potentially linked to the pathogenesis of certain human diseases, we performed a disease set enrichment analysis using disease sets on the basis of published literature (DSPL). DSPL enrichment analysis was performed using the MetaCore tool in the GeneGO package. The disease associations are summarized in Figure 5C, depicting the top diseases in which EC-restricted are enriched. The EC-restricted genes are enriched in the many cardiovascular diseases including ventricular dysfunction, myocardial infarction, hypertension, diabetic angiopathies, arteriosclerosis, and several other vascular diseases.
Interestingly, ischemia was listed as a disease in which the EC-restricted are over-represented (P value = 2E-06). The EC-restricted genes are also enriched (P value < 0.01) in neurological diseases including subarachnoid hemorrhage (P value = 3.00E-07).

Regulatory mechanism governing EC-restricted genes
To begin to understand the complex and intricate regulation of the EC-restricted genes, we were interested in determining whether certain transcription factors or miRNAs might be involved in regulating these genes. Transcription factors play a critical role in defining cell and tissue specificity of gene expression. In this study the TFactor enrichment analysis was performed on two sets of EC-restricted genes categorized on the basis of expression profiles; the sets of genes are highly expressed in i); all EC types (pan EC), ii); only in HMVEC. The TFactor enrichment analysis was only performed on these two sets as they constitute the major fraction of EC-restricted genes. TFactor enrichment analysis was performed using the ExPlain tool, a program for gene expression analysis from BIOBASE. We performed the analysis on a region 2 kb upstream to 100 bp downstream of each of the EC-restricted genes using vertebrate_non_redundant matrices (yes set). Background frequencies were calculated based on the promoters of human housekeeping genes (No set) [12]. A TF binding site was considered to be enriched in Figure 4 Validation of a selected subset of endothelial-restricted genes by quantitative RT-PCR. Validation of a subset of EC-restricted genes from Table 1 was conducted using primary ECs and non-ECs by quantitative RT-PCR (n = 3 per cell type). The gene symbol is listed for each gene. RQ refers to "relative quantity" where the expression in HUVECs has been set to 1.0 and the relative expression of the other cell types are compared to that in HUVECs. The analysis for pathways and disease set enrichment was performed using the MetaCore tool of the GeneGo package. The GO categories enrichment analysis was performed using the DAVID tool. The Bar graphs depict the enriched pathway or Go process categories and -log of the P value. The P value depicts the significance of enrichment, the smaller is the P value the more significant is the enrichment. The pathways and disease sets with FDR adjusted P value < 0.05 are considered significant. The panel for gene ontology enrichment depicts the enrichments for each GO category (-log P value) as well as the Escore for a cluster of related GO categories.
a gene set on the basis of the P value (P value < 0.001 and Yes/No > 1.2). The analysis identified binding sites for >20 transcription factors, among the EC-restricted genes expressed in all EC, and in the subset enriched only in microvascular ECs [ Figure 6]. Binding sites for the TF factor that were identified for both of these sets of genes included, CDXA, GATA, IPF1, NFAT, CDP, AIRE and OCT1. However, the binding sites for particular sets of transcription factors (e.g. FAC1, POU1F1, STAT1, AR, SRF, LRH) are only enriched in promoters of microvascular EC-restricted genes.
Another mechanism by which gene expression can be regulated is through small noncoding RNAs or micro-RNAs (miRNA). MiRNAs regulate gene expression through translational repression of mRNA by promoting the degradation of mRNA by binding to specific sequences in the untranslated regions of the mRNA. We performed a bioinformatics analysis of the EC-restricted genes in order to identify whether the identified EC-restricted genes are targets of miRNAs. We used composite regulatory signature database (CRSD) web tools that take into consideration the sequence match and free energy of binding to predict binding sites [16]. Our analysis identified 31 miRNA binding sites that are significantly enriched (P value < 0.05) in the UTR of the EC-restricted genes [ Figure 7]. Mir-432, Mir-188, and Mir-331 target each have putative binding sites in the 3' UTR of >8 EC-restricted genes. A summary of the miRNA binding sites for ECrestricted genes is provided in Table 2. Additionally details of miRNA Binding sites along with target and reference sequences are provided in Additional File 2.

Expression pattern of EC-restricted genes in tissues
A better understanding of how the EC-restricted genes are expressed in different tissues can help to define their function and potential use as disease biomarkers. Relative expression of the EC-restricted genes in several normal tissues was obtained using the Source databases http://source.stanford.edu. In the source database the normalized gene expression represents the relative expression level of a gene in different tissues. The colorogram depicting the percentage of relative expression of each gene is shown in Figure 8. The analysis demonstrates that most of the endothelial restricted genes have preferential expression in vascular tissues. In particular MMRN1, BMX, ANGPT2 and CDH5 demonstrate high expression levels in vascular tissues. VWF, TIE1, ROBO4 and ECSCR have very high expression levels in umbilical cord tissue (Table 3). These results strengthen our finding that these genes have relatively high expression levels in vascular related tissues.
To further explore whether any of the EC-restricted genes have specific expression in particular tissues, we obtained the immunohistochemistry data for 61 out of the 109 EC-restricted genes. The majority of the ECrestricted genes demonstrate a ubiquitous expression in different normal tissues (Additional File 3). A small subset of the genes show a restricted expression pattern in normal tissues. For example, VWF and ICAM2 are enriched in soft tissues. BMX, one of the top ranked endothelial restricted genes has preferential expression in the epididymis. CLDN5 is preferentially expressed in glandular cells of various body tissues. Interestingly, about 85% of genes depict moderate to high levels of expression in soft tissues.

Discussion
The results of our study demonstrate that of over 43,000 transcripts evaluated, only 152 appear to be highly restricted to the endothelium. Several of the genes identified have previously been reported to exhibit an ECrestricted expression pattern and have known functions in ECs. Examples of these genes include angiopoietin-2, von Willebrand's Factor (vWF), EC nitric oxide synthase (eNOS), and Pecam-1 (CD31). The pathways, and GO categories of the identified genes support a role for these genes in vascular development, angiogenesis, and EC function.
Although several of the EC-restricted genes have previously been shown to contribute to the regulation of normal EC function, many others have not been characterized as having a particular role in EC. The genes identified as being EC-restricted fall into several categories, including proteins involved in transcriptional regulation, cell adhesion, signal transduction, and intracellular trafficking. The determination that these genes are enriched in ECs may lead to future studies that define their specific role in regulating EC function.
The endothelium is known to play an important role in a number of human diseases, and so it was not a surprise that alterations in the expression of these genes are associated with a number of cardiovascular disorders. Mutations or alterations in the expression of several of the genes listed have been shown to be associated with the development of hypertension. For example, mutations in the eNOS gene have been linked to patients with essential hypertension [21][22][23]. Similar associations have been observed with mutations in the endothelin-1 gene [24,25]. More recent studies point toward a link between obesity and hypertension. There has been particular interest at understanding the role of adipocytokines and their receptors in the development of hypertension. Previous studies have suggested a causal link between leptin levels in obese patients and the development of hypertension [26]. A more recently discovered adipocytokine, apelin, is predominantly expressed in the ECs of the heart and support a role for apelin in the development of hypertension and cardiac hypertrophy [27].
The endothelium is known to play an important paracrine role with respect to cardiac function and development. The TGFbeta family member cytokine, bone morphogenetic protein-4 (BMP-4), is known to play an important role during cardiac development [28]. Increased expression of BMP-4 may similarly be reflective of a state of EC dysfunction. Exposure of ECs to BMP-4 promotes ROS generation [29]. BMP-4 expression is increased in EC exposed to abnormal or unstable flow, compared to regions of laminar shear flow [30]. Venous and microvessel ECs Figure 6 Regulation analysis of EC-restricted genes. The list of the transcription factor binding sites that are enriched in 2 kb upstream to 100 bp downstream region. The enrichment in gene sets that are highly expressed in all endothelial cells and only microvascular EC is shown in black and grey color respectively. The X-axis represents the transcription factors and Y-axis represents -log P value. Figure 7 Regulation analysis of EC-restricted genes in term of MiRNA targets. The list of the miRNA that are enriched in 3' UTR of EC specific genes. The X-axis represents the miRNA's and Y-axis represents -log P value. The miRNAs from the opposite standard of guided RNA strand are marked with star (*). exposed to BMP-4 rapidly undergo apoptosis [31]. These results suggest the possibility that BMP-4 could be a possible therapeutic target in the setting of heart failure to improve or reverse EC dysfunction.
The functional and structural integrity of the central nervous system depends on tightly controlled coupling between neural activity and cerebral blood flow. This requires the close interaction of neuronal cells and vascular cells in a complex that is known as the neurovascular unit. Recent experimental evidence suggest that dysfunction of the neurovascular unit may be an early event in Alzheimer's disease. Studies in transgenic mice overexpressing the amyloid precursor protein (APP) exhibit abnormalities in blood flow in response to functional hyperemia prior to the development of amyloid plaques or vascular amyloid [32]. Administration of soluble amyloid beta protein results in vasoconstriction, EC dysfunction and a reduction in CBF. One of the main mechanisms by which EC dysfunction occurs is through inactivation or reduced function of EC nitric oxide synthase (eNOS). Amyloid beta also induces the production of reactive oxygen species, alteration in the expression of tight junction proteins, and an increased rate of EC apoptosis [33]. In the brain tissue samples of patients with AD, we observed a significant increase in the expression of selected adherens and tight junction proteins including VE-cadherin, claudin-5, and connexin 37 (GJA4). Systemic administration of the amyloid beta peptide 1-42 to rats is associated with alterations in the expression and cellular localization of several tight junction proteins [33]. Another ECrestricted gene found to be significantly upregulated in the AD brain tissue samples is von Willebrand's Factor (vWF). Increased levels of vWF promote blood clotting. Increased vWF has been found in heme-rich deposits (HRDs) in patients with dementia [34]. HRDs are also rich in fibrinogen, collagen IV, and red blood cells, and are thought to be the residua of capillary bleeds, or microhemorrhages. In patients with acute ischemic stroke and vascular dementia, vWF levels have also been shown to be increased [35].
Our analysis of potential transcription factors that might be involved in regulating the expression of the identified EC-restricted genes, based on conserved binding sites in the regulatory regions of these genes led to the identification of several classes of transcription factors. Most of these transcription factors have not previously been described as playing a major role in the regulation of EC-restricted genes with some exceptions. Members of the ETS and GATA transcription factor families have been shown to regulate a number of endothelial genes including vWF, VE-cadherin, and Tie1 [36][37][38]. Interestingly, several conserved binding sites were identified only in the regulatory regions of the microvascular ECs suggesting that members of these transcription factor families may play a unique role in determining endothelial gene expression in microvessels.
Over the past several years a role for microRNAs has been demonstrated to play a role in regulating EC gene expression, function, and in the process of angiogenesis. Although most of the miRNAs we identified have not been described for their roles in regulating EC-restricted genes, a few have. For example, hsa-miR-296 has recently been shown to play a regulatory role in angiogenesis (39). Angiogenic factors can increase the The miRNAs that are expressed at relatively low level as compared to miRNA from opposite/guided standard are marked with star (*).
expression of hsa-miR-296. Down regulation of hsa-miR-296 in ECs inhibits angiogenic responses in cultured ECs. Furthermore, inhibition of hsa-miR-296 with antagomirs reduced angiogenesis in tumor xenografts in vivo. Similarly, hsa-miR-328 has been implicated in the regulation of CD44 [39]. CD44 regulates a wide variety or processes including angiogenesis and inflammation. The fact that only a small subset of the more than 700 microRNAs has thus far been shown to regulate ECrestricted genes or play a role in regulating EC function suggests that several additional members, including those we have identified, may well also play a role in regulating the expression of selected EC-restricted genes or EC function. We recognize that there are potential limitations of our study. First, the study used expression-profiling data based on RNA obtained from human tissues or cells. Although several of the genes identified are known to be vascular-specific, the newly identified genes will ultimately need further validation as to the true extent of their EC specificity, at the level of protein and/or RNA both in cells and tissues, and to validate their ECrestricted pattern within the identified tissues.

Conclusion
Our study validates the existence of a finite number of endothelial-restricted genes most of which are ubiquitously expressed. Several of these are restricted to cells of microvascular origin. Although several of the genes Figure 8 Relative normalized expression levels of EC-restricted genes in normal tissues. The expression level is expressed as relative percentage of expression in different tissues with red, yellow and green color denoting higher, median and lower expression levels respectively. The rows represent each gene and columns represent each normal tissue type. The relative expression level of the genes in different tissues is expression as percentage. The gene expression data for generating the normalized expression level was DbEST database of normal tissues at NCBI.
are known to play important roles in endothelial function, the exact functional role of many others in endothelial cells remains to be defined. We hope that our study provides an initial catalogue of EC-restricted genes that can lead to further studies that either link alterations in the expression of these genes to a variety of human diseases via their role as biomarkers or are ultimately shown to play a causal role in the pathogenesis of the particular human diseases.
Additional file 1: Nucleotide sequence of primers used for RT-PCR to validate expression pattern of selected EC-restricted genes.
Additional file 2: Summary of miRNA Binding sites along with target and reference sequences.
Additional file 3: Immunohistochemistry based expression level of genes in different tissues. Rows represent the different tissues and columns represent the different EC-restricted genes. The expression level is shown in four color circle scheme i) Red represents strong expression ii) Orange represents moderate expression, iii) Yellow represents weak expression, iv) White represents no detectable expression and Black represents no representative images. The data was obtained from human protein atlas database.