Transcriptome coexpression map of human embryonic stem cells
- Huai Li†1,
- Ying Liu†2,
- Soojung Shin2,
- Yu Sun1,
- Jeanne F Loring3,
- Mark P Mattson2,
- Mahendra S Rao4, 5Email author and
- Ming Zhan11Email author
© Li et al; licensee BioMed Central Ltd. 2006
Received: 06 February 2006
Accepted: 02 May 2006
Published: 02 May 2006
Human embryonic stem (ES) cells hold great promise for medicine and science. The transcriptome of human ES cells has been studied in detail in recent years. However, no systematic analysis has yet addressed whether gene expression in human ES cells may be regulated in chromosomal domains, and no chromosomal domains of coexpression have been identified.
We report the first transcriptome coexpression map of the human ES cell and the earliest stage of ES differentiation, the embryoid body (EB), for the analysis of how transcriptional regulation interacts with genomic structure during ES self-renewal and differentiation. We determined the gene expression profiles from multiple ES and EB samples and identified chromosomal domains showing coexpression of adjacent genes on the genome. The coexpression domains were not random, with significant enrichment in chromosomes 8, 11, 16, 17, 19, and Y in the ES state, and 6, 11, 17, 19 and 20 in the EB state. The domains were significantly associated with Giemsa-negative bands in EB, yet showed little correlation with known cytogenetic structures in ES cells. Different patterns of coexpression were revealed by comparative transcriptome mapping between ES and EB.
The findings and methods reported in this investigation advance our understanding of how genome organization affects gene expression in human ES cells and help to identify new mechanisms and pathways controlling ES self-renewal or differentiation.
Large-scale transcriptional profiling and the availability of the complete genome sequences have made it possible for transcriptome mapping analysis in various organisms . Transcriptome maps showing the density of expressed genes along the chromosome have revealed genomic regions that correspond to known amplicons of human tumors [2–4]. Regional similarity of expression on the chromosome have been observed in the yeast Saccharomyces cerevisiae , nematode Caenorhabditis elegans , fruit fly Drosophila melanogaster[1, 6, 7], and human [2, 8]. Transcriptome maps showing regional similarities illustrate the existence of chromosomal domains of gene coexpression and transcriptional regulation operating at the local chromosome level. Transcriptome mapping analyses have been based on data generated from a variety of experimental techniques, including Expressed Sequence Tags , Serial Analysis of Gene Expression , and microarray . All of these studies have revealed interesting and novel patterns of transcriptome in relation to genomic organization, molecular evolution, and biological functions.
Human embryonic stem (ES) cells have the ability to differentiate into a variety of cell lineages and hold promise for drug discovery, toxicology, and replacement therapies. The embryoid body (EB) is the earliest stage of ES differentiation in culture. The transcriptome of human ES and EB cells has been studied in detail in recent years [10–16]. These studies have suggested that ES cells have an open transcriptome with few cold spots or hot spots of gene expression in the undifferentiated state and a more complex global regulation in the EB stage of differentiation. However, no systematic analysis has yet addressed whether gene expression in human ES cells may be regulated in chromosomal domains, and no chromosomal domains of coexpression have been identified. Here, we describe the first analysis of coexpression of neighboring genes on the chromosome in ES and EB cells. We determined gene expression profiles by BeadArray™  and constructed transcriptome maps for both ES and EB cells. The map showed a significant pattern of gene coexpression on chromosome domains. The coexpression remained significant regardless of the effect of gene duplication. The genomic distribution of coexpression chromosomal domains was found to be non-random, with different coexpression patterns observed in ES and EB cells. The coexpression chromosome domains were biological and physiological significant. ESC – important molecular functions or biological processes were found to be enriched in the domains. The transcriptome map provided a basis to examine transcriptional regulation operating at the level of chromosomal domains in human ES cells and differential coexpression of gene clusters during the ES differentiation. The findings of this study advance our understanding of how genome organization affects gene expression and hence the self-renewal or differentiation of ES cells.
The overall goal of this study was to elucidate general coexpression patterns at the domain level in ES and EB. The coexpression profiling was based on the combination of six different cell lines representing ES or EB. Each cell line had a single sample, except I6 (2 samples). An additional sample was derived from pooled culture of different cell lines. The six cell lines and their relatedness to each other are illustrated in Supplementary Table S1 [see Additional file 6]. The cell line samples were similar to each other on the expression profiles in ES and EB, with a bit higher heterogeneity in EB than ES. The gene expression profile of each human ES cell line and its EB counterpart were determined using the high-density BeadArray™. The array contains 23,584 probes, representing 20,692 unique genes. Based on the expression data, we calculated the coexpression index for each gene in a sliding window across each chromosome. The coexpression index for a given gene was defined as the average of Pearson's correlation coefficient values of gene expression levels between this gene and every neighboring gene upstream and downstream within a certain window. The correlation of between genes was calculated from the expression values in the seven samples of ES or EB. The coexpression index, which measures the degree of coexpression among neighboring genes on the chromosome, was used in the subsequent construction and analysis of transcriptome coexpression maps in ES and EB cells.
1. ES and EB cells show significant coexpression patterns along the chromosome
In order to statistically examine whether genes are significantly coexpressed on the chromosome, we calculated the mean value of the coexpression index for the entire set of expressed genes on the genome. The mean coexpression index was determined for different window sizes and from two different genomic data sources: a) the real genome, to which the expressed genes were mapped; and b) the randomized genome, which was created by shuffling position indexes of the same number of genes on each chromosome. Supplementary Fig. S1 [see Additional file 1] presents plots of the mean coexpression index values for both ES and EB states. As shown, the mean coexpression index from the real genome data was consistently higher than that from the random genome data across different window sizes in both ES and EB. This suggested a significant pattern of coexpression of neighboring genes in both ES and EB cells. The plots (Supplementary Fig. S1 [see Additional file 1]) further showed that the mean coexpression index decreased greatly when the neighboring gene number increased from 2 to 20. Beyond 20 neighboring genes, the decrease of the coexpression level became less significant, and this continued for domains of up to 50 neighboring genes. This finding suggested that clusters of up to 20 neighboring genes may be coexpressed on the chromosome. We therefore used the window size of 20 genes for subsequent coexpression index analyses in this study.
We next determined the P value of both the coexpression index and mean coexpression index at the window size of 20 genes by Monte-Carlo simulation. Coexpression index and mean co-expression index values were calculated from 10,000 randomized genome data (see Methods). The derived Monte-Carlo distributions (Supplementary Fig. S2 [see Additional file 2] and S3 [see Additional file 3]) allowed determination of the P value for the coexpression index of a given gene or the mean co-expression index of a given set of genes. For the ES and EB expression data, the mean coexpression index was 0.027 and 0.021 respectively. The P values for both mean co-expression index values were below 0.00001 (Supplementary Fig. S2 [see Additional file 2] and S3 [see Additional file 3]). The Monte-Carlo simulation thus provided further evidence that coexpression of neighboring genes was significant in the real genome in ES and EB cells.
The coexpression of neighboring genes may be due to duplicated genes, which often remain adjacent and have similar expression patterns . To assess the effect of gene duplication on the coexpression index in ES and EB, we re-generated randomized genome data sets where all tandem duplication genes  were removed, and conducted Monte-Carlo simulation again. The mean coexpression index values were 0.026 and 0.020 in ES and EB, respectively, from the real genome data where tandem duplication genes were removed. The P value of the mean coexpression index values after the removal of tandem duplication genes was still very low (below 0.00001 in both ES and EB). Therefore, gene coexpression on the chromosome in ES and EB cells was statistically significant regardless the effect of tandem duplication of genes, which had little impact on the observed pattern.
2. Transcriptome coexpression map
3. Coexpression pattern of chromosomal domain
As shown in Fig. 3, the SOX15 domain represented a pattern in which the degree of coexpression was higher in ES than EB. The domain extended for about 410 kb at the 17p13 region of chromosome 17. As illustrated in the transcriptome map (Fig. 3, middle), most genes on the domain had higher coexpression index values in ES (blue dots) than EB (red dots), with the highest score observed in SOX15 (0.36 in ES vs 0.176 in EB), followed by FXR2 (0.2985 in ES vs 0.037 in EB). No gene expressed in EB had the coexpression index value above the threshold, and the highest one was only 0.184. PCA is robust in capturing and presenting major variations of expression profiles on leading principal components. The PCA map (Fig. 3, left) revealed a large difference in the clustering of domain genes and thus the differential coexpression of the gene cluster between ES and EB. As shown, the domain genes were clustered tightly together to a small size of ellipsoid by the correlated expression profiles in ES (blue dots), but clustered loosely to a larger ellipsoid by the less correlated expression profiles in EB (brown dots). The heatmap and cluster analysis showed less diversity on the expression profile among genes in ES than EB samples, indicating a higher correlation of expression in ES. Interestingly, although the domain was differentially coexpressed, 11 genes of the domain were not differentially expressed between ES and EB samples (ANOVA P value = 0.05). The other 10 genes were differentially expressed, but displayed various degrees of up- and down-regulation in comparison to the mean expression level.
For the PTPRCAP domain (Fig. 4), on the other hand, the degree of coexpression was higher in EB than ES. The domain stretched for 583 kb at 11q13.3. The transcriptome map (Fig. 4, middle) showed that most genes in the domain had higher coexpression index values in EB (blue dots) than ES (red dots). Ten genes on the domain had coexpression indices above the threshold in EB, with the highest observed in PPP1CA (0.535 in EB vs 0.031 in ES). No gene expressed in ES had the coexpression index above the threshold, and the highest coexpression index was 0.202 (CORO1B). The higher coexpression level of the domain in EB than ES was also illustrated by the PCA and cluster analyses. The PCA map showed that the genes expressed in EB (blue dots) were tightly clustered to a small ellipsoid, whereas the genes expressed in ES (brown dots) were loosely clustered to a much larger ellipsoid. The heatmap showed less diversity on the expression level among genes expressed in EB than ES. Although the gene cluster was differentially coexpressed, 13 genes on the domain were not differentially expressed between ES and EB, the other 8 genes were differentially expressed, but exhibited mixed patterns of up- or down-regulation.
While the SOX15 and PTPRCAP domains showed differential coexpression, the NGFRAP1 domain (Fig. 5) displayed a pattern of similar coexpression between ES and EB. The domain was 1,498 kb long and located at Xq22.2. Five or six genes had the coexpression index above the threshold in ES and EB, respectively, with the highest observed in NGFRAP1 (0.418 in ES and 0.523 in EB). The PCA map showed similar clustering of the genes and the heatmap showed similar expression profiles between ES and EB, suggesting a similar degree of coexpression in ES and EB. Although similarly coexpressed, six genes of the domain were differentially expressed between ES and EB.
4. Distribution of coexpression domain
The number of coexpression chromosomal domains and genes located in the domains on each chromosome, and the associated P values of by the Fishers exact test (**p value < 0.01; *p value < 0.05). The domains were identified at the coexpression index threshold 0.3 and window size 20.
Number of coexpression domains
Number of genes located in the domains
Number of coexpression domains
Number of genes located in the domains
We next determined whether the identified coexpression chromosomal domains correlated with any known cytogenetic bands on the chromosome. Giemsa positive or negative bands, centromeric regions, and variable length heterochromatic regions were examined at the 850-band resolution . These cytogenetic patterns represent distinct and reproducible structure of extended and compacted regions on the chromosome. Table 3 shows the frequency of domain genes in each structural pattern and the P value by the Fisher's exact test (detailed information is provided in Supplementary Tables S2 [see Additional file 7] and S3 [see Additional file 8]). Among all of the genes in the coexpression chromosomal domains in ES, 62.2% (783 genes) were located in Giemsa-negative bands, and 36.6% (461) in Giemsa-positive bands. These proportions were similar to those predicted from all known genes on the entire genome . Ten coexpression domains genes were located in the variable length heterochromatic region (19q-11 to 19q-13), and the P value 0.028. The variable length heterochromatic region was thus significantly enriched by the coexpression genes in ES. On the other hand, the Giemsa-negative region was significantly enriched by the coexpression genes in EB (P value 0.008).
5. Gene ontology analysis
We next determined the GO terms which were significant associated with coexpression chromosome domains, using the Fisher's exact test. The results are shown in Supplementary Tables S4 [see Additional file 9] and S5 [see Additional file 10] (P value ≤ 0.05). Many domains were associated with biological functions, particularly with the regulation of transcription, transcription factor activity, and chromosome organization. Some domains were associated with ES – important functions or biological processes, such as apoptosis, pattern specification, histogenesis and organogenesis, embryogenesis and morphogenesis.
ES cell gene expression is carefully regulated and cells either maintain the pluripotent state by self-renewal or undergo differentiation. This is the first study to investigate the coexpression of genes along the chromosome in human ES cells and their earliest stage of differentiation in culture, EB's. Significant coexpression patterns were revealed and confirmed by random tests and Monte-Carlo simulation. The coexpression is suggestive of transcriptional regulation operating at the chromosome domain level in ES and EB cells. The coexpression domains do not appear to represent amplicons or regions of chromosome imbalance that were previously described in cancer cells . The chromosome region with adjacent localization of the genes NANOG, STELLAR, and GDF3 has been considered as a hotspot for teratocarcinoma . Our study however indicated that the genes in the region were not coexpressed, suggestive of no transcriptional regulation operating at this domain in ES or EB. Nevertheless, the identified coexpression chromosome domains are biologically and physiologically significant, some of which are associated with functions important to ES development. New coexpression chromosome domains would possibly be observed when each cell line had been analyzed separately. Recent studies have shown that some ES cell lines exhibit unique morphological and genetic features . The cell line BG01V, for example, shows abnormal chromosome and karyotype, different from other ES cells [26, 27].It is thus important to examine cell line specific patterns of local coexpression, which will be the future direction of our studies.
The genes LIFR, GP130, STAT3, OCT3/4, SOX2, UTF-1, FOXD3, ERAS, TEL1, FGF4, NANOG, NODAL, TDFG1, CER1, and ABCG2 have shown to be critical for ESC self-renewal and self-renewal and regarded as the "signature" [15, 16, 28]. Some of these genes were not coexpressed on the chromosome (Table 3), suggesting that global regions still tend to be involved in determining the overall state of the ES cell and provide context for cell-type specific signaling. Nonetheless, the other ES-signature genes did show coexpression along the chromosome. The genes that were adjacent and coexpressed with the signature genes were often related to development and transcriptional regulation. STAT3, for example, is a transcription factor which plays a central role within ES self-renewal pathways and feed-back loops [29, 30]. The STAT3 gene, located at 17q21, was coexpressed in ES (coexpression index 0.37) but not in EB (-0.17) (Table 3). The coexpression chromosome domain where STAT3 resides also contained the duplicated genes STAT5B and STAT5A, as well as TCF1, a transcription factor important in proliferation and differentiation. Other ES-signature genes, UTF1, TLE1, and OCT3/4, showed higher coexpression index in EB (0.299, 0.285, and 0.23, respectively, although slightly lower than the threshold value) than in ES (0.125, 0.08, and 0.10, respectively). UTF1 is a transcription factor, and the domain where the UTF1 gene is located (at 10q26) contained two other transcriptional factors, VENTX2 [a homeodomain protein implicated in mesodermal patterning and hemopoietic stem cell maintenance ], and NKX6-2. TLE1 is an ES cell-specific gene coding a RNA-binding protein which functions downstream of the LIF and Oct3/4 pathways [32, 33]. The TLE1 gene domain is located at 9q21.32 and the coexpressed genes included the duplicated gene TLE4 and signal transduction genes GNAQ, GKAP42, and GNA14. OCT3/4 is also a transcriptional factor critical for ES cell self-renewal . The OCT3/4 domain, located at 6p21.31, contained NFKBIL1 and MHC class I genes. In addition to the ES signature genes, other genes important for ES cell development were also found to be coexpressed on the chromosome in ES or EB. SOX15, for example, is a transcription factor involved in the regulation of embryonic development and transcriptional control in ES cells . The gene was significantly up-regulated in ES cells (P value 0.029, fold-change 3.25). The SOX15 domain (Fig. 3) showed coexpression in ES cells but not in EB cells, as described above. Among the genes on this domain, EFNB3 belongs to the ephrin gene family and is implicated in development, TNFSF1 is a cytokine belonging to the tumor necrosis factor (TNF) ligand family, and POLR2A, ZBTB4, TP53, and FXR2 are all involved in transcription. Apparently, the differentiation or self-renewal of ES cells was evidenced not only by the differential expression of individual genes at the global level, but also by the differential coexpression of genes at the chromosomal domain level.
Chromosomal clustering of functionally related genes has been demonstrated in various eukaryotes, including the yeast, fruit fly, nematode, and human [1, 5, 7, 8]. Natural selection might have organized genes to clusters on the chromosome according to the molecular function or biological process so that their expression can be coordinately regulated. The coexpression of physically adjacent genes may be caused by the long range effect of transcription factors, chromatin structure modifications, or increased concentration of components of the transcriptional machinery (such as transcription factors) in a particular subnuclear location of chromosomal segments . The coexpression could also be due to duplicated genes, which often remain adjacent and have similarexpression patterns . Our study revealed that gene duplication had a minimal impact and was not a major contributing factor for the observed coexpression pattern in ES and EB. Our study also revealed differential local coexpression between ES and EB. Differentially coexpressed genes may not be differentially expressed, while similarly coexpressed genes may be differentially expressed. The transcriptome map thus provides a basis for examining how transcriptional regulation interacts with genomic structure and how genes clustered on the chromosome are coexpressed during the ES self-renewal and differentiation.
Taken together, the transcriptome map provides information on transcriptional events operating at the local chromosome level in ES cells and localized coexpression of genes during differentiation. The identified coexpression chromosome domains are significantly associated with biological or physiological functions, some of which were important for ES development. Global and local regions are both involved in determining the overall state of the ES cell and provide context for cell-type specific signaling. The findings and methods reported in this investigation advance our understanding of how genome organization affects gene expression in human ES cells and help to identify new mechanisms and pathways controlling ES self-renewal or differentiation.
Human embryonic stem cell culture
Frequency of coexpressed genes in different cytogenetic structural patterns and P values by Fisher's exact tests.
Gene number and percentage in total
Gene number and percentage in total
783 ; 62.2%
Variable length heterochromatic band
Differentiation of ESC as embryoid bodies
Human ES cells growing on feeders or feeder-free conditions were harvested by collagenase (1 mg/ml, Invitrogen or Sigma) and resupsended in DMEM/F12 with 15% FCS, 5% KSR, 20 mM L-Glutamine, 0.5 U/ml penicillin, 0.5 U/ml streptomycin, 0.1 mM β-mercaptoethanol, and 1x non-essential amino acids. Floating spheres were grown for up to 14 days in the same medium before RNA extraction. Supplementary Fig. S5 [see Additional file 5] shows undifferentiated human ES cell lines cultured on inactivated MEF and grown in a feeder free condition, and embryoid bodies generated by growing ES cells in ultralow attachment plates to form floating spheres.
RNA extraction, BeadArray preparation, and data processing
RNA was extracted from 14 ES and EB samples using a standard TriZol (Invitrogen) method. The BeadArray used in this study contained 23,584 probes, representing 20,692 genes recognized by RefSeq . Each gene or transcript was represented on the BeadArray by 3–10 oligonucleotides, each 50-base long. The intensity data on the array were calculated from the images generated by the BeadArray Reader (Illumina). Details of the RNA amplification, labeling, and hybridization steps are available from . The mean intensity of an individual probe was calculated across all arrays, normalized by the quantile method, and the log2 ratio of each value to this mean was calculated. When several probes corresponded to the same gene (i.e. if different probes had the same gene symbol or GenBank ID), a single probe was kept for the analysis. Data of the chromosomal location and cytogenetic structural pattern of each gene were obtained from the RefSeq database .
Construction of the transcriptome map
The transcriptome map was constructed based on the correlation of expression profiles among neighboring genes along the chromosome, using a method similar to that described previously [1, 7, 23]. The correlation of expression profiles between genes was calculated as the Pearson's correlation coefficient from the expression values of the seven samples representing ES or EB. For each gene, its correlation values with every upstream and downstream neighbor genes within a certain window size were first determined. The average of the correlation values was defined as the coexpression index of this gene. The number of neighboring genes (or 'window size') used to calculate the coexpression index was determined by repeated analysis with different neighboring gene numbers (ranging from 4 to 50), followed by assessment of changes of the coexpression pattern. The statistical significance of coexpression was assessed by Monte-Carlo simulation. In the simulation, random genome data sets were created by shuffling position indices of the same number of genes on each chromosome and the expression profiles of the genes. The coexpression index of each gene and mean values of the coexpression index of each data set were then calculated from the random data. This process was repeated for 10,000 times; the resulting distributions of both coexpression index and mean coexpression index were fit to the Gaussian density function. The P values of the coexpression index and mean coexpression index from the real data were determined according to the derived probability distribution by the simulation. The graphical presentation of the transcriptome map on each chromosome was generated by plotting the coexpression index value of each gene displayed according to its position along the chromosome.
Biological significance of coexpression chromosome domain
ES – signature genes and coexpression with the neighboring genes. The genes in domains are arranged as they are on the chromosome from 5' to 3'.
Coexpression Index (ES)
Coexpression Index (EB)
Gene in Domain*** (20 Neighboring genes: 10 upstrea, 10 downstream)
INADL; FLJ10884; LOC163782; USP1; DOCK7; ANGPTL3; 400756; AUTL1; 199897; LOC199899; FOXD3; ALG6; ITGB3BP; PGM1; ROR1; 219612; MGC35130; KRTAP4–7; KIAA1573; KIAA1579; JAK1
401062; XCR1; CCR1; CCR3; CCR2; CCR5; CCRL2; LTF; TMEM7; LRRC2; TDGF1; FLJ36525; TMIE; TSP50; TESSP5; TESSP2; MYL3; PTHR1; MGC23918; HYPB; KIF9
GNB4; BAF53A; MRPL47; 133993; NDUFB5; USP13; PEX5R; TTC14; FXR1; LOC131118; SOX2; 401103; 402152; LOC142678; ATP11B; RP42; MCCC1; LAMP3; KIAA0861; B3GNT5; KLHL6
DHRS8; NUDT9; SPARCL1; DSPP; DMP1; LOC153218; IBSP; MEPE; SPP1; PKD2; ABCG2; DKFZp761G058; CEB1; MGC14156; DRLM; TIGD2; LOC285513; SNCA; MMRN; IRAK1BP1; TMSL3
FLJ30596; FLJ25422; SLC1A3; IDN3; FLJ13231; NUP155; FLJ10233; GDNF; 147975; FLJ39155; LIFR; 253254; 401182; OSMR; MGC39830; FYB; C9; DAB2; PTGER4; OSRF; PRKAA1
GZMA; FLJ37927; 345643; UNG2; DHX29; KIAA0052; PPAP2A; FLJ90709; DDX4; CRL3; IL6ST; FLJ11795; 345645; MGC33648; FLJ35954; DKFZp761C169; 345651; SNK; FLJ33641; RAB3C; PDE4D
IER3; DDR1; 389376; DPCR1; C6orf15; PSORS1C1; CDSN; PSORS1C2; C6orf18; TCF19; OCT3/4; LOC253018; HLA-C; HLA-B; MICA; HCP5; MICB; BAT1; ATP6V1G2; NFKBIL1; LTA
NIRF; GLDC; GASC1; PTPRD; TYRP1; 286343; MPDZ; 401492; NFIB; ZDHHC21; CER1; FLJ25461; C9orf52; SNAPC3; PSIP2; FLJ39267; C9orf39; SH3GL2; ADAMTSL1; FLJ35283; MGC35182
PCSK5; FLJ11149; GCNT1; C9orf65; CHAC; GNA14; GNAQ; FLJ12643; PSAT1; TLE4; TLE1; FLJ43950; 389763; FLJ31614; MGC20553; UBQLN1; GKAP42; KIF27; C9orf64; HNRPK; C9orf76
C10orf35; COL13A1; H2AFY2; AMID; MGC34695; SARA1; PP; OT7T022; FLJ10751; EIF4EBP2; NODAL; KIAA1274; PRF1; ADAMTS14; C10orf27; 338611; SGPL1; PCBD; UNC5B; SLC29A3; CDH23
C10orf39; DPYSL4; PKE; LOC170394; LOC170393; INPP5A; NKX6-2; FLJ25954; GPR123; KIAA1768; UTF1; VENTX2; ADAM8; TUBGCP2; ZNF511; CALCYON; UPA; FLJ26016; ECHS1; PAOX; LOC92170
CPT1A; MRPL21; IGHMBP2; MRGD; MGC21621; TPCN2; MYEOV; CCND1; ORAOV1; FGF19; FGF4; FGF3; 399920; ORAOV2; FADD; PPFIA1; EMS1; SHANK2; 399921; LOC220070; DHCR7
RBP5; CLSTN3; PXR1; 341392; M160; CD163; APOBEC1; GDF3; DPPA3; CLECSF11; NANOG; SLC2A14; SLC2A3; FHX; C3AR1; DKFZP566B183; CLECSF6; FLJ10408; CLECSF8; CLECSF9; AICDA
201181; LGP2; GCN5L2; HspB9; RAB5C; KCNH4; HCRT; LGP1; STAT5B; STAT5A; STAT3; PTRF; ATP6V0A1; NAGLU; HSD17B1; DPCK; TCFL4; HUMGT198A; LOC162427; TUBG1; TUBG2
SLC38A5; FTSJ1; PPN; EBP; RBM3; WDR13; WAS; SUV39H1; GATA1; HDAC6; ERAS; PCSK1N; TIMM17B; PQBP1; SLC35A2; PIM2; DKFZp761A052; KCND1; TFE3; JM11; JM4
Significantly enriched GO terms, chromosomes, and cytogenetic patterns
The Fisher's exact test was conducted to calculate the hypergeometric probability of observing a GO term as enriched in each group of genes. In specific, the probability p that a GO term is significantly enriched in a group of genes was calculated with the following formula:
where k is the number of genes in the group, G is the total number of genes, n is the number of genes in the group with a given GO term, and A is the total number of genes with a given GO term. The domains which had at least four genes associated with GO terms at P = 0.05 were selected. Like-wisely, significantly enriched chromosomes and cytogenetic structural patterns by each group of genes were also determined by the Fisher's exact test.
Principal component analysis, clustering analysis, and identification of differentially expressed genes
Unsupervised hierarchical clustering analysis and principal component analysis (PCA) were conducted using software Cluster , TreeView, and Partek™, based on the Pearson's correlation. Differentially expressed genes between ES and EB were identified by ANOVA using Partek™.
We wish to thank Drs. M. Gorospe and S. Zou for critical reading of this manuscript, and other laboratory colleagues for insightful suggestions and helpful discussions. This work was supported, at least in part, by the Intramural Research Program of the National Institute on Aging, NIH.
- Cohen BA, Mitra RD, Hughes JD, Church GM: A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet. 2000, 26 (2): 183-186. 10.1038/79896.PubMedView ArticleGoogle Scholar
- Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus MC, van Asperen R, Boon K, Voute PA, Heisterkamp S, van Kampen A, Versteeg R: The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science. 2001, 291 (5507): 1289-1292. 10.1126/science.1056794.PubMedView ArticleGoogle Scholar
- Fujii T, Dracheva T, Player A, Chacko S, Clifford R, Strausberg RL, Buetow K, Azumi N, Travis WD, Jen J: A preliminary transcriptome map of non-small cell lung cancer. Cancer Res. 2002, 62 (12): 3340-3346.PubMedGoogle Scholar
- Zhou Y, Luoh SM, Zhang Y, Watanabe C, Wu TD, Ostland M, Wood WI, Zhang Z: Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis. Cancer Res. 2003, 63 (18): 5781-5784.PubMedGoogle Scholar
- Lercher MJ, Blumenthal T, Hurst LD: Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res. 2003, 13 (2): 238-243. 10.1101/gr.553803.PubMedPubMed CentralView ArticleGoogle Scholar
- Boutanaev AM, Kalmykova AI, Shevelyov YY, Nurminsky DI: Large clusters of co-expressed genes in the Drosophila genome. Nature. 2002, 420 (6916): 666-669. 10.1038/nature01216.PubMedView ArticleGoogle Scholar
- Spellman PT, Rubin GM: Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol. 2002, 1 (1): 5-10.1186/1475-4924-1-5.PubMedPubMed CentralView ArticleGoogle Scholar
- Lercher MJ, Urrutia AO, Hurst LD: Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet. 2002, 31 (2): 180-183. 10.1038/ng887.PubMedView ArticleGoogle Scholar
- Qiu P, Benbow L, Liu S, Greene JR, Wang L: Analysis of a human brain transcriptome map. BMC Genomics. 2002, 3 (1): 10-10.1186/1471-2164-3-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Abeyta MJ, Clark AT, Rodriguez RT, Bodnar MS, Pera RA, Firpo MT: Unique gene expression signatures of independently-derived human embryonic stem cell lines. Hum Mol Genet. 2004, 13 (6): 601-608. 10.1093/hmg/ddh068.PubMedView ArticleGoogle Scholar
- Skottman H, Mikkola M, Lundin K, Olsson C, Stromberg AM, Tuuri T, Otonkoski T, Hovatta O, Lahesmaa R: Gene expression signatures of seven individual human embryonic stem cell lines. Stem Cells. 2005Google Scholar
- Rao RR, Calhoun JD, Qin X, Rekaya R, Clark JK, Stice SL: Comparative transcriptional profiling of two human embryonic stem cell lines. Biotechnol Bioeng. 2004, 88 (3): 273-286. 10.1002/bit.20245.PubMedView ArticleGoogle Scholar
- Sperger JM, Chen X, Draper JS, Antosiewicz JE, Chon CH, Jones SB, Brooks JD, Andrews PW, Brown PO, Thomson JA: Gene expression patterns in human embryonic stem cells and human pluripotent germ cell tumors. Proc Natl Acad Sci U S A. 2003, 100 (23): 13350-13355. 10.1073/pnas.2235735100.PubMedPubMed CentralView ArticleGoogle Scholar
- Richards M, Tan SP, Tan JH, Chan WK, Bongso A: The transcriptome profile of human embryonic stem cells as defined by SAGE. Stem Cells. 2004, 22 (1): 51-64. 10.1634/stemcells.22-1-51.PubMedView ArticleGoogle Scholar
- Brandenberger R, Wei H, Zhang S, Lei S, Murage J, Fisk GJ, Li Y, Xu C, Fang R, Guegler K, Rao MS, Mandalam R, Lebkowski J, Stanton LW: Transcriptome characterization elucidates signaling networks that control human ES cell growth and differentiation. Nat Biotechnol. 2004, 22 (6): 707-716. 10.1038/nbt971.PubMedView ArticleGoogle Scholar
- Miura T, Luo Y, Khrebtukova I, Brandenberger R, Zhou D, Thies RS, Vasicek T, Young H, Lebkowski J, Carpenter MK, Rao MS: Monitoring early differentiation events in human embryonic stem cells by massively parallel signature sequencing and expressed sequence tag scan. Stem Cells Dev. 2004, 13 (6): 694-715. 10.1089/scd.2004.13.694.PubMedView ArticleGoogle Scholar
- Oliphant A, Barker DL, Stuelpnagel JR, Chee MS: BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques. 2002, 56-58. 60-51, Suppl
- Oliver B, Parisi M, Clark D: Gene expression neighborhoods. J Biol. 2002, 1 (1): 4-10.1186/1475-4924-1-4.PubMedPubMed CentralView ArticleGoogle Scholar
- Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 2003, 4 (4): R25-10.1186/gb-2003-4-4-r25.PubMedPubMed CentralView ArticleGoogle Scholar
- Furey TS, Haussler D: Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet. 2003, 12 (9): 1037-1044. 10.1093/hmg/ddg113.PubMedView ArticleGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004, 14 (6): 1085-1094. 10.1101/gr.1910904.PubMedPubMed CentralView ArticleGoogle Scholar
- Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19 (10): 1275-1283. 10.1093/bioinformatics/btg153.PubMedView ArticleGoogle Scholar
- Reyal F, Stransky N, Bernard-Pierrot I, Vincent-Salomon A, de Rycke Y, Elvin P, Cassidy A, Graham A, Spraggon C, Desille Y, Fourquet A, Nos C, Pouillart P, Magdelenat H, Stoppa-Lyonnet D, Couturier J, Sigal-Zafrani B, Asselain B, Sastre-Garau X, Delattre O, Thiery JP, Radvanyi F: Visualizing chromosomes as transcriptome correlation maps: evidence of chromosomal domains containing co-expressed genes--a study of 130 invasive ductal breast carcinomas. Cancer Res. 2005, 65 (4): 1376-1383. 10.1158/0008-5472.CAN-04-2706.PubMedView ArticleGoogle Scholar
- Clark AT, Rodriguez RT, Bodnar MS, Abeyta MJ, Cedars MI, Turek PJ, Firpo MT, Reijo Pera RA: Human STELLAR, NANOG, and GDF3 genes are expressed in pluripotent cells and map to chromosome 12p13, a hotspot for teratocarcinoma. Stem Cells. 2004, 22 (2): 169-179. 10.1634/stemcells.22-2-169.PubMedView ArticleGoogle Scholar
- Maitra A, Arking DE, Shivapurkar N, Ikeda M, Stastny V, Kassauei K, Sui G, Cutler DJ, Liu Y, Brimble SN, Noaksson K, Hyllner J, Schulz TC, Zeng X, Freed WJ, Crook J, Abraham S, Colman A, Sartipy P, Matsui S, Carpenter M, Gazdar AF, Rao M, Chakravarti A: Genomic alterations in cultured human embryonic stem cells. Nat Genet. 2005, 37 (10): 1099-1103. 10.1038/ng1631.PubMedView ArticleGoogle Scholar
- Zeng X, Chen J, Liu Y, Luo Y, Schulz TC, Robins AJ, Rao MS, Freed WJ: BG01V: a variant human embryonic stem cell line which exhibits rapid growth after passaging and reliable dopaminergic differentiation. Restor Neurol Neurosci. 2004, 22 (6): 421-428.PubMedGoogle Scholar
- Plaia TW, Josephson R, Liu Y, Zeng X, Ording C, Toumadje A, Brimble SN, Sherrer ES, Uhl EW, Freed WJ, Schulz TC, Maitra A, Rao MS, Auerbach JM: Characterization of a New NIH Registered Variant Human Embryonic Stem Cell Line BG01V: A Tool for Human Embryonic Stem Cell Research. Stem Cells. 2005Google Scholar
- Bhattacharya B, Miura T, Brandenberger R, Mejido J, Luo Y, Yang AX, Joshi BH, Ginis I, Thies RS, Amit M, Lyons I, Condie BG, Itskovitz-Eldor J, Rao MS, Puri RK: Gene expression in human embryonic stem cell lines: unique molecular signature. Blood. 2004, 103 (8): 2956-2964. 10.1182/blood-2003-09-3314.PubMedView ArticleGoogle Scholar
- Zhan M, Miura T, Xu X, Rao MS: Conservation and variation of gene regulation in embryonic stem cells assessed by comparative genomics. Cell Biochem Biophys. 2005, 43 (3): 379-405. 10.1385/CBB:43:3:379.PubMedView ArticleGoogle Scholar
- Rao M: Conserved and divergent paths that regulate self-renewal in mouse and human embryonic stem cells. Dev Biol. 2004, 275 (2): 269-286. 10.1016/j.ydbio.2004.08.013.PubMedView ArticleGoogle Scholar
- Moretti PA, Davidson AJ, Baker E, Lilley B, Zon LI, D'Andrea RJ: Molecular cloning of a human Vent-like homeobox gene. Genomics. 2001, 76 ((1–3)): 21-29. 10.1006/geno.2001.6574.PubMedView ArticleGoogle Scholar
- Tanaka TS, Kunath T, Kimber WL, Jaradat SA, Stagg CA, Usuda M, Yokota T, Niwa H, Rossant J, Ko MS: Gene expression profiling of embryo-derived stem cells reveals candidate genes associated with pluripotency and lineage specificity. Genome Res. 2002, 12 (12): 1921-1928. 10.1101/gr.670002.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhan M, Miura T, Xu X, Rao M: Conservation and variation of gene regulation in embryonic stem cells assessed by comparative genomics. Cell Biochem Biophy. 2005,Google Scholar
- Niwa H, Miyazaki J, Smith AG: Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat Genet. 2000, 24 (4): 372-376. 10.1038/74199.PubMedView ArticleGoogle Scholar
- Maruyama M, Ichisaka T, Nakagawa M, Yamanaka S: Differential roles for Sox15 and Sox2 in transcriptional control in mouse embryonic stem cells. J Biol Chem. 2005, 280 (26): 24371-24379. 10.1074/jbc.M501423200.PubMedView ArticleGoogle Scholar
- Brimble SN, Zeng X, Weiler DA, Luo Y, Liu Y, Lyons IG, Freed WJ, Robins AJ, Rao MS, Schulz TC: Karyotypic stability, genotyping, differentiation, feeder-free maintenance, and gene expression sampling in three human embryonic stem cell lines derived prior to August 9, 2001. Stem Cells Dev. 2004, 13 (6): 585-597. 10.1089/scd.2004.13.585.PubMedView ArticleGoogle Scholar
- NIH stem cell information home page. [http://stemcells.nih.gov/index.asp]
- NCBI reference sequence. [http://ncbi.nih.gov/RefSeq/]
- Home- Illumina, Inc. [http://www.illumina.com/]
- The gene ontology. [http://www.geneontology.org/]
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95 (25): 14863-14868. 10.1073/pnas.95.25.14863.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.