Co-expression of adjacent genes in yeast cannot be simply attributed to shared regulatory system
© Tsai et al. 2007
Received: 29 May 2007
Accepted: 03 October 2007
Published: 03 October 2007
Skip to main content
© Tsai et al. 2007
Received: 29 May 2007
Accepted: 03 October 2007
Published: 03 October 2007
Adjacent gene pairs in the yeast genome have a tendency to express concurrently. Sharing of regulatory elements within the intergenic region of those adjacent gene pairs was often considered the major mechanism responsible for such co-expression. However, it is still in debate to what extent that common transcription factors (TFs) contribute to the co-expression of adjacent genes. In order to resolve the evolutionary aspect of this issue, we investigated the conservation of adjacent pairs in five yeast species. By using the information for TF binding sites in promoter regions available from the MYBS database http://cg1.iis.sinica.edu.tw/~mybs/, the ratios of TF-sharing pairs among all the adjacent pairs in yeast genomes were analyzed. The levels of co-expression in different adjacent patterns were also compared.
Our analyses showed that the proportion of adjacent pairs conserved in five yeast species is relatively low compared to that in the mammalian lineage. The proportion was also low for adjacent gene pairs with shared TFs. Particularly, the statistical analysis suggested that co-expression of adjacent gene pairs was not noticeably associated with the sharing of TFs in these pairs. We further proposed a case of the PAC (polymerase A and C) and RRPE (rRNA processing element) motifs which co-regulate divergent/bidirectional pairs, and found that the shared TFs were not significantly relevant to co-expression of divergent promoters among adjacent genes.
Our findings suggested that the commonly shared cis-regulatory system does not solely contribute to the co-expression of adjacent gene pairs in yeast genome. Therefore we believe that during evolution yeasts have developed a sophisticated regulatory system that integrates both TF-based and non-TF based mechanisms(s) for concurrent regulation of neighboring genes in response to various environmental changes.
The arrangement and orientation of genes in genomes is often shaped through evolution by mechanisms such as unequal crossing over followed by random genetic drift or natural selection [1, 2]. Recent studies indicate that the distribution of genes in genomes does not always happen at random [3–5]. In the human genome, housekeeping genes show a strong tendency to cluster together , and genes that participate in the same pathway also tend to lie adjacent to each other in the genome [5, 7, 8]. Moreover, several studies indicate that adjacent genes in human seem to co-express regardless of their intergenic distance [9–11]. Similar phenomena have been observed in Drosophila, nematode, and yeast [12–16]. Among these observations, the co-expression of adjacent pairs is crucial because changes in such genome organization could alter the co-regulated transcription over the pairs [11, 12].
How co-expressed genes are regulated is still unclear. Two major mechanisms proposed are alterations of chromatin structure and sharing of the same regulatory elements [3, 5, 15]. The open conformation of the chromatin structure is required for genes to be transcribed into RNAs and thus become expressed. A general hypothesis is that clusters of genes in the same chromatin domain have a higher chance to be expressed simultaneously than genes located in different chromatin domains [5, 17]. Alternatively, cis-regulatory elements could behave like fine modules that alter gene expression locally. Therefore, adjacent pairs with common upstream activation sites (UAS) or shared regulatory systems are more likely to be co-expressed [9, 12].
Several attempts have been made to investigate the mechanism for co-expression of adjacent gene pairs. In human, the abundance of divergent pairs relative to convergent and tandem pairs has been reported , and the common CpG islands that were often found between divergent pairs were known to be associated with an "open" or "active" chromatin [11, 18, 19]. However, co-expressed groups of adjacent genes spanning 20–200 kilobases in the Drosophila genome did not show any correlation with known chromosomal structures [10, 16]. Later, the idea of co-expression among clustered genes was rejected by Thygesen and Zwinderman , whose study also failed to discover any correlation between the chromatin domain and co-expressed genes in Drosophila.
It is evident that in yeast adjacent gene pairs display stronger co-expression than random pairs do . Kruglyak and Tand  proposed that some co-expressed pairs resulted from sharing a single regulatory system, despite the fact that many genes controlled by separate regulatory systems may also have highly co-expressed patterns. Hurst et al.  also concluded that divergent orientation is dominant for co-regulation and for conservation of pairs, but the finding had weak statistical support. Although these studies suggested that the sharing of a common UAS plays an important role in regulating co-expressed pairs, and that divergent pairs are more likely to share the same regulatory system, the co-expression level (defined by correlation coefficient) of divergent pairs is not significantly higher than that of tandem pairs with a similar intergenic distance . The relative contribution of the two major mechanisms to the co-expression of adjacent genes is still in debate for different organisms.
Recently, Byrnes et al.  proposed that the majority of gene loss in yeast happened after whole-genome duplication (WGD) by single-gene deletion. Their observation implied that adjacent gene pairs were not preserved after WGD. On the other hand, several studies indicated that adjacent pairs were conserved in some organisms due to the sharing of regulatory elements [4, 22]. To investigate the contribution of regulatory elements to the co-expression of adjacent pairs, we first examined the conservation of adjacency in five yeast species. It is of particular interest to study the conservation of adjacent pairs using yeast species which have undergone WGD, because the duplicated adjacent relationship would in theory be free of evolutionary selection. Importantly, the advancement of technology has led to the establishment of databases of transcription factors (TF) and transcription factor binding sites (TFBSs). These tools allow researchers to investigate the mechanism for co-expression of adjacent pairs by studying sharing of common regulatory systems. Herein, we present a comprehensive examination of the intergenic regions between adjacent genes to inquire whether these pairs frequently share common TFs. Our study provides clear evidence that sharing of the common TFs is not an exclusive component of the driving force in co-regulation of adjacent gene pairs in yeast.
Summary of the orthologous adjacent genes in Saccharomyces sensu stricto species relative to the 5,702 gene pairs in S. cerevisiae.
Orthologous adjacent pairsb
Stringently conserved pairs (ratio)c
Loosely conserved pairs (ratio)d
The proportions of commonly shared TFs of conserved adjacent pairs and non-conserved pairs.
No TF in common
Only one TF in common
Multiple TFs in common
We acknowledge the potential bias and noise that microarray data may bear. To circumvent this problem, we also analyzed the condition-specific datasets separately (see Additional file 1 Fig. 1, 2, and 3) and obtained observations similar to those using the merged dataset.
Since the proportion of adjacent pairs sharing TFs is low as aforementioned, it is important to inquire whether the shared TFs in divergent pairs are more likely to co-regulate the divergent genes. We present a case here to illustrate the effects of sharing regulatory system on co-expression. Beer and Tavazoie studied two computationally discovered sequence elements, PAC (polymerase A and C) and RRPE (rRNA processing element), which are considered to have combinatorial regulations on their target genes . The authors found very similar expression patterns among genes with PAC located within 140 bp and RRPE within 240 bp of the ATG start codon, respectively.
Two implications can be drawn from this analysis. First, TFs tend to exert stronger regulatory effects on the gene proximal to their binding sites in a divergent pair. Second, sharing of TFs per se does not warrant co-regulation of adjacent genes, yet an increase in the motif occurrences may ensure simultaneous modulation on both sides in the situations where co-regulation is required. Altogether our results suggest that genes in a divergent pair do not necessarily use the same regulatory machinery, which in turn may lead to differential expression between the pair partners.
We compared the adjacency relationships of gene pairs among five yeast species which had undergone WGD and found random distribution for all three adjacent patterns. The evidence supported the hypothesis that the selection on types of adjacency along the S. cerevisiae lineage was neutral after WGD . This neutrality also explained our observation of a low proportion of conserved adjacent pairs in five yeast species (e.g. 5.7~6.45% for the stringently conserved group). Similar results were found in orthologous gene pairs between the S. cerevisiae and C. albicans genomes which had diverged before WGD . Since adjacent pairs have a tendency to co-express in yeast , observations from these studies contradict the hypothesis that adjacent pairs with co-expression patterns are more likely to maintain the adjacent relationship during evolution [4, 11, 22]. This implied that co-expression of adjacent pairs may be due to other mechanisms such as chromatin opening.
In contrast to yeast, a higher proportion of conserved adjacent pairs were observed in the genomes of mammalian lineage [11, 22]. It is possible that the selection strength and/or mechanism over adjacency is different in yeast than in human [11, 22]. It is also interesting to note that the ratios of conserved divergent, convergent, and tandem pairs are similar. This leads to the conclusion that for yeast divergent relationship is not appreciably favored by selection, even though these pairs are more likely to share a regulatory system and thus are more likely to display co-expression. Importantly, this notion is also different from that drawn from the vertebrate genomes, in which the conservation ratio of divergent pairs is higher than that of tandem pairs, suggesting a negative selection on the separation of divergent pairs during evolution of vertebrates .
It is proposed that the conservation of divergent pairs in human has functional importance. This hypothesis is supported by the significant expression correlation and functional association among divergent pairs [4, 10, 11, 22]. Although several cases in yeast have shown functional associations for conserved divergent pairs [8, 12], a higher co-expression level in divergent pairs could not be detected when compared to tandem pairs . Consistent with this finding, we found no difference in the co-expression level among three adjacency patterns for the stringently conserved group, supporting the observation of neutrality in adjacency types. In addition, we found the co-expression levels of conserved adjacent pairs and non-conserved adjacent pairs to be approximately the same in yeast, indicating that the adjacent relationship of co-expressed pairs is free from selection constraint in yeast. It seems that a bias toward divergent gene organization is only observed in the lineage leading to mammals . If this is true, a possible explanation is that the mechanisms concerning the co-expression of adjacent pairs in yeast are different from those in mammals. For example, mechanisms such as sharing of cis-regulatory elements and antisense transcription, both of which explained the co-expression of human adjacent genes [5, 25], are actually rare in yeast genome [26, 27].
It has been suggested that when adjacent yeast genes are controlled by a single regulatory system, their expression patterns should be highly correlated [12, 20]. In order to investigate whether shared TFs in adjacent pairs are responsible for the co-expression, we collected the TF information from adjacent pairs of S. cerevisiae for further analysis. Surprisingly, the ratio of adjacent pairs with shared TFs is low (about 12%). A similar trend is observed when the dataset is separated into conserved and non-conserved adjacent pairs, indicating that such feature is not particularly favored by selection. Therefore, it is reasonable to infer that co-expression of adjacent pairs in yeast does not merely result from sharing the TF-based regulatory system. This is also contrary to the findings that in human a high proportion of the adjacent pairs share a regulatory system which consequently drives co-expression of neighboring genes [11, 22].
It is commonly believed that genes with the same regulators have similar expression profiles. However, we observed that the co-expression level of TF-sharing adjacent pairs is not higher than that of those without common TFs. We performed a case study on PAC and RRPE, two combinatorial cis-acting sequences whose target genes are expected to display high levels of co-expression. But our analysis showed the contrary that only 6 out of 22 divergent pairs had similar expression profile, and 72% (16 out of 22) of the divergent pairs are not co-expressed. Furthermore, the six co-expressed divergent pairs appear to have independent cis-regulatory elements. These results suggest that the shared regulatory system of adjacent genes in S. cerevisiae is not highly relevant to their co-expression.
Considering the low prevalence of sharing TFs and the lack of selection constraint on adjacency of adjacent pairs, one possible explanation for the co-expression phenomenon is chromatin modifications [3, 15, 29]. Mechanisms such as histone acetylation, deacetylation and DNA methylation, may contribute significantly to the co-expression of neighboring genes in S. cerevisiae [25, 28]. Detailed analyses of transcription pattern as well as chromatin structure of co-expressed genes are required to shed light on the questions raised by this report.
The purpose of this study arose from speculating on the impact that sharing of TFs might have on driving concurrent expression of adjacent gene pairs. We found that gene adjacency was not strongly favored during yeast evolution. Furthermore, the analysis on co-expression in adjacent gene pairs and shared TFs showed an indistinct relationship. Albeit the bias or noise potentially present in microarray data, the clear result of the divergent pairs co-regulated by PAC and RRPE led us to conclude that the shared TFs can not fully explain the co-expression of divergent pairs.
In summary, our study does not refute the contribution of commonly shared TFs to co-regulation of adjacent genes in yeast, but our finding does suggest that TF sharing is not the sole determinant of such regulation. We believe that during evolution yeasts have developed a sophisticated regulatory system which integrates both TF-based and non-TF based mechanisms(s), of which the latter may account for a greater extent in driving co-expression of neighboring genes. This integrative regulatory system allows yeasts to simultaneously modulate expression of neighboring genes in order to adapt to changing environments rapidly and efficiently.
The genome sequences and annotations of five yeast species (including Saccharomyces cerevisiae, Saccharomyces castellii, Saccharomyces bayanus, Saccharomyces kudriavzevvi and Saccharomyces mikatae) were downloaded from Saccharomyces Genome Database (SGD, http://www.yeastgenome.org). There were 6310, 4681, 4970, 3778, and 3109 annotated ORFs from these genomes, respectively. Gene pairs in S. cerevisiae were identified by their relative position in the genome. Dubious and silent ORFs were excluded from analysis. Overlapping genes were also removed because they might have biased the expression analysis. Finally a total of 5743 genes of S. cerevisiae were used for analysis. Based on the positional annotation in SGD they were categorized into groups of divergent pairs, convergent pairs and tandem pairs, adding up to 5702 adjacent pairs detected in S. cerevisiae.
Using S. cerevisiae as the reference genome, orthologous ORFs were identified in S. castellii (3857), S. bayanus (4642), S. kudriavzevvi (3212), and S. mikatae (2435) (Table 1). Except for S. cerevisiae, we determined adjacent pairs for the remaining four yeast species by mapping all the ORFs to their contigs according to the annotated sequences. The ORFs within the same contigs were sorted based on their hit positions. Adjacent relationship was then designated by their relative positions and orientations annotated in SGD. We identified 2053, 3975, 2376, and 1609 orthologous adjacent pairs in S. castellii, S. bayanus, S. kudriavzevvi, and S. mikatae, respectively. (Table 1)
We considered an adjacent pair conserved if the neighboring ORFs were orthologues of adjacent genes in S. cerevisiae and meanwhile retained the same orientation pattern. If one (or both) genes of an adjacent pair in S. cerevisiae were missing or the pairing orientation was different in other species, the pair was ascribed to the non-conserved group. The conserved pairs were then classified into stringently conserved and loosely conserved groups according to their degree of conservation. An adjacent pair was considered stringently conserved if the adjacent relationship was preserved in all five yeast species. The loosely conserved group refers to the pairs that have an adjacent relationship preserved in any three of the five yeast species, or preserved in S. castellii and one another of the four yeast species. This is because that S. castellii is the most distantly related species among the five, and the chance of convergent evolution is remote. As a result, there were 345 stringently conserved pairs and 3,582 loosely conserved pairs. Among those, there were 94, 95, and 156 stringently conserved pairs and 942, 973, and 1,667 loosely conserved pairs for the divergent, convergent, and tandem categories, respectively.
To compare the preserved patterns among these five yeast species, the relative ratios of three adjacent patterns were analyzed by chi-square test using a random sampling as reference (0.25 : 0.25 : 0.5) (Table 1).
We selected four S. cerevisiae microarray datasets for the expression analysis, including alpha , cdc , crz1p , and env . Both the alpha and cdc datasets are time course expression profiles encompassing two to three cell cycles after release from growth arrest. The alpha data were obtained from cells treated with alpha-factor transiently, and the cdc data was collected from a cdc15-2 temperature sensitive mutant which resumed growth after release from heat shock. For the crz1p dataset, yeast cells were triggered for ionic signaling by either calcium (Ca2+) or sodium (Na+). The env dataset contains expression profiles of yeast cells exposed to diverse environmental perturbations. Each array was normalized so that the log ratios had a mean of zero. To avoid potential discrepancy between arrays due to factors specific to each condition, we merged these four array data into one large dataset and used the Pearson coefficient to calculate the co-expression level for each adjacent pair.
To investigate whether conserved adjacent pairs had a higher tendency to be co-expressed, we compared for the conserved group the expression correlations of divergent pairs, convergent pairs, and tandem pairs to those of a group of 5,000 non-adjacent random pairs. We used the Kolmogorov-Smirnov (KS) test to examine whether two groups of gene pairs were co-expressed to different extents. The KS test is a nonparametric test which determines if two distributions differ significantly. The KS test calculates the maximum vertical deviation (D) between the empirical distribution functions of the two groups to determine whether the two datasets are drawn from the same distribution. Let x be the expression correlation of an ORF pair over all experimental points. Let f i (x) be the density function of x for the gene pairs in group i, and F i (x) be the function of corresponding cumulative distribution. For groups i and j, if the statistic D is significantly large, we infer that the two groups of gene pairs are from two distinct distributions and are expressed differentially. Similarly, we used the KS test to examine the significance of the differences between conserved adjacent pairs and non-conserved adjacent pairs.
To understand whether co-expression of adjacent pairs is mainly due to sharing of the same regulatory system, we studied the correlation of co-expression level to the presence and the number of commonly shared TFs. We collected the TF information from MYBS , a web-based service that identifies TFBSs with comprehensive annotation. MYBS integrates an array of predicted and known transcription factor binding sites (TFBSs) with a calculated position weight matrix (PWMs) and incorporates DNA-binding affinity data from chromatin-immunoprecipitation microarray experiments (ChIP-chip) as well as the phylogenetic footprinting data of TFBSs from eight related yeast species.
In this study, we considered a TF to regulate a gene if: 1) its binding to the gene was supported by a p-value less than 0.01 in the ChIP-chip experiment; 2) there existed a short sequence pattern satisfying the PWM's threshold; and 3) the sequence pattern was conserved in at least one of the four Saccharomyces sensu stricto species.
For more detailed comparison, we classified the adjacent pairs into three groups of pairs without shared TFs, pairs with one TF in common, and pairs with multiple TFs in common. Again, we used the KS test to examine whether shared TFs were relevant to the co-expression level between the pair.
We thank Grace Tzu-Wei Huang and reviewers for valuable comments. This work was jointly supported by research grants from Research Center for Biodiversity at Academia Sinica and NSC to DW, and from Institute of Information Science at Academia Sinica to HKT.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.