The adjacent positioning of co-regulated gene pairs is widely conserved across eukaryotes
© Arnone et al.; licensee BioMed Central Ltd. 2012
Received: 27 February 2012
Accepted: 3 October 2012
Published: 10 October 2012
Skip to main content
© Arnone et al.; licensee BioMed Central Ltd. 2012
Received: 27 February 2012
Accepted: 3 October 2012
Published: 10 October 2012
Coordinated cell growth and development requires that cells regulate the expression of large sets of genes in an appropriate manner, and one of the most complex and metabolically demanding pathways that cells must manage is that of ribosome biogenesis. Ribosome biosynthesis depends upon the activity of hundreds of gene products, and it is subject to extensive regulation in response to changing cellular conditions. We previously described an unusual property of the genes that are involved in ribosome biogenesis in yeast; a significant fraction of the genes exist on the chromosomes as immediately adjacent gene pairs. The incidence of gene pairing can be as high as 24% in some species, and the gene pairs are found in all of the possible tandem, divergent, and convergent orientations.
We investigated co-regulated gene sets in S. cerevisiae beyond those related to ribosome biogenesis, and found that a number of these regulons, including those involved in DNA metabolism, heat shock, and the response to cellular stressors were also significantly enriched for adjacent gene pairs. We found that as a whole, adjacent gene pairs were more tightly co-regulated than unpaired genes, and that the specific gene pairing relationships that were most widely conserved across divergent fungal lineages were correlated with those genes that exhibited the highest levels of transcription. Finally, we investigated the gene positions of ribosome related genes across a widely divergent set of eukaryotes, and found a significant level of adjacent gene pairing well beyond yeast species.
While it has long been understood that there are connections between genomic organization and transcriptional regulation, this study reveals that the strategy of organizing genes from related, co-regulated pathways into pairs of immediately adjacent genes is widespread, evolutionarily conserved, and functionally significant.
The ability of cells to appropriately regulate the expression levels of large sets of genes is one of the critical hallmarks of living systems, and it can be orchestrated across a wide range of circumstances, including during progression through the cell cycle, during cell differentiation and development, and in response to changing environmental conditions. For example, within a given cell cycle, cells regulate the biosynthesis of relevant sets of gene products that are appropriate for particular metabolic needs (i.e. the coordinated synthesis of histones during S phase ). Regulated expression can also extend over much longer time frames, as is the case for different members of the globin gene cluster, which are alternatively activated or repressed during mammalian development . These regulatory changes can extend to hundreds of genes at a time, and can include subtle controls for maintaining a precise stoichiometry of gene product production. Cells can also rapidly respond to changing environmental conditions through large-scale transcriptional changes, as in the stress response in S. cerevisiae, which is associated with coordinated expression changes of roughly half of the genome .
One way that cells manage to coordinate the expression of large sets of genes is through the maintenance of particular sub-nuclear architectures. Indeed, perhaps the oldest and best characterized example of a sub-nuclear compartment is the nucleolus, the sub-nuclear localization where the rDNA is sequestered and production of the ribosome begins [4, 5]. The rDNA repeats are transcribed in the nucleolus, and the nascent rRNAs are immediately subjected to extensive processing and assembly into pre-ribosomal particles . Among the other sub-nuclear distinctions associated with eukaryotic genomes are the so called euchromatin and heterochromatin regions, which establish a local context that is either conducive or inhibitory to transcription, respectively . More recently, it has been observed that there are dozens of sub-nuclear foci called ‘transcription factories’ that are enriched for actively expressed genes . The localization of genes to particular sub-nuclear compartments can change quickly in response to environmental cues, where the activation of a gene can result in its re-localization to the nuclear periphery, allowing for coordination of transcription with processing and nuclear export .
There are localized subsets of the genome that are transcriptionally correlated in eukaryotic species as diverse as A. thaliana, D. rerio, M. musculus, D. melanogaster, and S. cerevisiae[10–14]. That is, physically adjacent DNA regions (typically a 2–3 gene window) tend to have a positive correlation of expression with each other. Additionally, the nematode worm Caenorhabditis elegans has operon-like structures, reminiscent of a prokaryotic genomic arrangement . The distribution of genes throughout the genome is non-random, and the particular position of a gene on a chromosome can also play a critical role in its transcriptional regulation . The globin and Hox genes are striking examples of this phenomenon, as their positional order in the genome corresponds to their spatial and temporal expression during development [2, 17].
Multiple studies have found that the integration of a reporter construct in varied genomic locations can result in significant differences in its expression levels in many organisms, from yeast to humans [18–20]. More recently there has been an increased appreciation that this phenomenon is not limited to the insertion of an artificial reporter construct, but that local genomic context play an important role in gene regulation [12, 21]. In particular, the effects of genomic position on transcription have been particularly well documented in S. cerevisiae, where the relocation of a gene from a euchromatic region to a heterochromatic region can result in repression of that gene . The coordinated expression of adjacent genes is also important, particularly with those who share bi-directional promoters which allow for the coordinated production of two protein coding genes through a shared cis- regulatory region .
One of the most metabolically demanding pathways that growing cells must regulate is that of ribosome biogenesis, a complex biosynthetic pathway that depends on the coordinated action of the several hundred gene products required to produce functional ribosomes. Typically, the genes that function in ribosome biosynthesis are highly expressed, and they are also tightly regulated under changing environmental conditions. Previously, through our investigations in S. cerevisae, we described a large set of coregulated genes - the ribosome and rRNA biosynthesis (RRB) regulon – whose products function in various levels of rRNA and ribosome biosynthesis and processing. Like the genes whose products form the ribosomal proteins (RPs) themselves, the RRB genes are tightly co-regulated under changing cellular conditions [24, 25]. Interestingly, we discovered that the genes from the RP and RRB gene sets exhibited an unusual pattern of their positions on the chromosomes; an unusually high fraction of the genes were found as immediately adjacent gene pairs . We extended this observation across a wide variety of yeast species, including the finding that some 24% of the RRB genes from C. albicans are present as adjacent gene pairs, including all orientations of convergent, divergent and tandem gene arrangements .
In this study we report that high levels of paired adjacency for genes in regulated pathways is not limited to ribosome biogenesis in yeast. We observed that immediate gene adjacency is associated with tighter transcriptional co-regulation as compared to unpaired genes, and that as a whole, the set of paired genes are more tightly co-regulated. Elevated levels of gene adjacency can be observed across a diverse set of co-regulated gene sets in yeast, and many of the gene pairing relationships are conserved across divergent fungal lineages. Furthermore, we report that significant levels of immediate gene adjacency can also be found for ribosome biogenesis genes across a wide variety of eukaryotes. Together, these findings reveal a widespread and fundamental link in eukaryotes between adjacent gene placement and gene co-regulation.
In our previous analysis of the genes that are involved in ribosome biogenesis in S. cerevisiae, we noted that some 13% of the RP genes, and 15% of the RRB genes are located on the chromosomes as immediately adjacent gene pairs. Given recently updated gene annotations, we expanded the list of genes that comprise the RRB regulon, and included new members from the gene ontologies of ribosome biogenesis, rRNA processing, 90S pre-ribosome and the small subunit (SSU) processome. This expanded set brings the RRB family to 282 genes, of which 44 (16%) exist as immediately adjacent gene pairs.
Significant gene adjacency is conserved among several gene families in S. cerevisiae
Purine Base Metabolism
DNA Damage Response
Response to Arsenic
Heat Shock Response
Response to Toxin
To determine if the adjacent gene pairs from these additional regulons were also associated with tighter gene co-regulation, we compared their relative expression along with the unpaired genes under changing conditions. For example, we investigated the expression profiles of the 18 heat shock responsive genes following a heat shock time-course (Additional file 7: Figure S5). The paired heat shock genes show much higher average correlations to each other than did the unpaired genes during the heat-shock (PCC equal to 0.89 for paired genes compared to 0.14 for the unpaired genes). This pattern held true for a number of the other gene ontology groups, including a higher degree of paired gene co-regulation for those genes involved in carbohydrate metabolism (average PCC of 0.49 versus 0.17), purine base metabolism (average PCC of 0.13 versus −0.27) and nitrogen metabolism (a PCC equal to 0.14 versus 0). Therefore, it appears that the tighter co-regulation of paired genes can be observed across diverse gene sets.
One mechanism whereby adjacent genes that function in the same biochemical pathway could arise is through a gene duplication event followed by subsequent divergence of one of the duplicates. Indeed, ancestral to many yeast lineages, including S. cerevisiae, there was a whole genome duplication event some 150 million years ago that was subsequently followed by the elimination of most of the duplicated genes . This large scale doubling would first create duplicates on separate chromosomes, but potentially, genetic recombination, and subsequent elimination and modification of genes could give rise to high levels of adjacent genes that function in related pathways. In particular, because the majority of the RP genes from S. cerevisiae are present in the genome as two nearly identical homologs, we investigated the extent to which gene duplications could account for the high number of immediately adjacent genes that function in a given cellular pathway. To investigate this possibility, each member of the immediately adjacent gene pairs was compared by BLAST analysis to its adjacent partner and to the other genes in the S. cerevisiae genome . Overwhelmingly, we found that the two members of an adjacent gene pair were not related to each other by sequence similarity, with the exception of one gene pair from the carbohydrate metabolism gene set. The tandem, adjacent gene pair CDA1-CDA2 do appear to be derived from a gene duplication event, as they share extensive sequence similarity (E-value = 3.4x10-94 by BLAST). For every other comparison between the adjacent gene pairs, the E-value >1, and for those genes that did have a closely related homolog, it was located at another chromosomal location (see Additional file 8: Table S2). Thus, the high degree of adjacent gene pairing was not due to gene duplication events.
Interestingly, we did observe an adjacent pairing of RP genes that was related to the genome duplication event, but the adjacent pairing appears to have been present before the WGD, and has been conserved since. The adjacent RPL18A-RPS19A gene pair is found on chromosome 15 and the adjacent RPL18B-RPS19B gene pair is found on chromosome 14. While the RPL18A and RPL18B genes are highly related (E-value = 2.3x10-81) and the RPS19A and RPS19B genes are highly related (E-value = 2.6x10-80), in each case the immediately adjacent gene partners are not related (E-value>1).
For the RRB gene pairs, 20 out of the 22 gene pairs are also found as adjacent pairs in at least one of the Saccharomyces sensu strictu species, and 6 of the genes are paired within C. albicans and C. dubliniensis. For two of the gene pairs, DBP8-NMD3 and RRP15-NOC4, both genes exist as partners with RRB genes in K. lactis, K. waltii and S. kluyveri although in each case the pairing is to a different partner. Although none of the same RRB gene pairs from S. cerevisiae are found in S. pombe, 8 of the paired RRB genes can be found paired with another RRB gene in S. pombe. For the RP regulon, 12 out of 14 RP gene pairings are the same in at least one Saccharomyces sensu strictu species, and 7 of the pairings are conserved in C. albicans. One gene pair, RPP2A-RPS15, is the same in S. pombe, and additionally there are four RP genes in S. pombe that are paired with a new RP gene.
All 8 of the gene pairs whose proteins function in the DNA damage response pathway are conserved in at least one Saccharomyces sensu strictu species, while only one pair is conserved through the C. albicans and C. dubliniensis lineages. None of these gene pairs are found in any sort of pairing arrangement in S. pombe. The gene parings that are involved in purine base metabolism are completely conserved only in the Saccharomyces sensu strictu species. Of the gene pairings that are observed among the carbohydrate metabolism, only the GAL1, GAL7, GAL10 genes are found as immediate adjacent neighbors in species other than S. cerevisiae. In S. pombe the GAL7-GAL10-GAL1 gene triplet contains an insertion of SPBPB2B2.11 (a nucleotide sugar dehydrogenase involved in galactose metabolism) between GAL7 and GAL10. The heat-shock response gene pairings, HSP12-MDJ1 and SGT2-SLG1, are both conserved in S. paradoxus, S. mikatae, K. lactis, and S. kluyveri. The least conserved pairings are those from the ontology classes involved in the response to toxins and the response to arsenic, where none of the S. cerevisiae pairings are retained in any of the species investigated in this study, not even the closely related S. paradoxus (Figure 5B).
In order to assess the significance of the conservation of specific gene pairings across related yeast species, we investigated the background levels of small-scale gene pair synteny across four species. We used 10,000 iterations of a bootstrapping approach to query what fraction of either a random set of either 180 or 282 (the sizes of the RP and RRB regulons) S. cerevisiae genes were maintained as adjacent gene pairs in at least one of the S. paradoxus, S. bayanus and S. mikatae species. Overall, we found that there is a roughly 67% chance that a given adjacent gene pair from S. cerevisiae would be maintained as an adjacent gene pair in one of these three species ( Additional file 9: Figure S6). While this result demonstrates the overall high degree of synteny between the four yeasts, we observed that the adjacent gene pairs from the RP and RRB regulons were even more likely to be maintained as adjacent gene pairs within at least one of these three other species (85% and 91% maintenance for the RP and RRB adjacent gene pairs respectively). Therefore, there appears to be a selective pressure to maintain the adjacency of coregulated genes from ribosome-related metabolic pathways across divergent species.
We have previously reported that there are greater numbers of paired RRB and RP protein genes in both C. albicans and S. pombe than in S. cerevisiae. We repeated this comparative analysis by using the C. albicans gene pairings as our starting reference set of genes and curated the conservation in both C. dubliniensis and S. cerevisiae. Again, we found broad conservation of gene pairing across many yeast species, there is absolute conservation of the RRB and the DNA damage gene pairings between C. albicans and C. dubliniensis and of the conserved RPs between these species, there is only one pairing that is not retained in C. dubliniensis.
To understand why certain gene pairs may be conserved to a higher degree across different yeast species than other gene pairings, we looked for a relationship between overall expression levels and high degrees of paired conservation. For our minimally conserved gene set, we grouped together the gene pairs that were only paired in S. cerevisiae (9 gene pairs). For our widely conserved gene set, we grouped together the genes in which at least one of the pairs was also paired all through the Ascomycotina (at least one of the genes is paired through to S. pombe, there were 17 pairs). The remaining gene pairs, which were conserved among Saccharomycotina, represent an intermediate level of conservation (38 gene pairs). We compared the overall expression levels of the three gene sets  and found that greater conservation correlates with higher levels of transcription, with the transcription levels of the most highly conserved genes being twice that of the genes that are not conserved (Figure 5C).
Immediate gene adjacency is conserved across widely divergent eukaryotes
Protein Coding Genes
Ribosomal Protein Genes
The number of ribosomal proteins varies significantly across eukaryotic species, ranging from the 66 genes that have been identified in G. lamblia, to 387 that can be found in A. thaliana. We found that the incidence of immediate gene adjacency for the ribosomal proteins varied widely across eukaryotes, and with immediately adjacent gene pairs representing fewer than 2% of the total in N. gruberi to over 13% in P. falciparum. The incidence of gene adjacency in the fungi N. crassa (19%) and A. nidulans (12%) is similar to that seen in S. cerevisiae (13%). There are also significant levels of RP gene adjacency seen in the widely studied model systems, A. thaliana (12%), D. melanogaster (4%), and C. elegans (7%).
Unlike the ribosomal proteins, the rRNA processing and ribosome biogenesis genes are less well characterized in each of the species that were studied. We set to first identify RRB genes in other species, and then to characterize their genomic organization including the conservation of adjacent gene pairing. The BLAST algorithm was used to identify homologues from 100 S. cerevisiae RRB genes in each species, and then we mapped their genomic distributions. We were able to identify between 92 (T. thermophila) and 118 (H. sapiens) RRB genes in each of the species that we analyzed. Due to the as yet incomplete genome assemblies for N. crassa, A. nidulans, and N. gruberi these species were omitted from this RRB analysis. While this approach would be expected to yield a vast underestimate of the degree of RRB gene pairing in other organisms since it is limited by a small sampling set (i.e. based on only 100 RRB genes from S. cerevisiae), and by the poor annotation records of RRB genes in general as compared to RP genes, we could, however, see evidence for adjacent gene pairing of RRB genes in other eukaryotes including 4% in P. falciparum and 6% in C. elegans. Interestingly, we did not observe significant levels of pairing between the members of the RRB and RP gene sets. Thus, it appears that significant levels of immediate, adjacent pairing of genes related to ribosome biogenesis are widely conserved across diverse eukaryotic lineages.
In our initial characterization of the membership of the RRB regulon in S. cerevisiae we noted that a highly significant fraction of the genes occurred in the genome as adjacent gene pairs . This report extends that finding significantly, and reveals that this phenomena is not constrained to gene sets associated with ribosome biogenesis, but rather that a wide variety of other responsive gene sets in S. cerevisiae also contain significant numbers of adjacent gene pairs. While considerable attention has been paid to the identification and characterization of groups of genes that function in particular areas of metabolism, including the genes involved in the response to stress and nutrients , carbohydrate metabolism , nitrogen metabolism [34, 35], toxic metals such as arsenic , the response to DNA damage , and the genes of the RP regulon , until now, the extent to which they include a significant fraction of adjacent gene pairing has been underappreciated. This non-random distribution of gene locations can be observed in even the smallest of gene sets, including the 8 member purine metabolism (62% adjacent), or response to arsenic (38% adjacent) gene sets, as well as in the 175 member DNA damage response (9% adjacent) set. When we did observe gene adjacency, it occurred as pairs of genes that were distributed across all possible orientations: divergent, tandem and convergent. There were cases in which up to three genes from within a given gene set were located in a row, but these were rare (roughly 3% of the genes), and there was only a single incidence of a four gene string (IMA1-MAL13-MAL11-MAL12). A recent report on the ‘neighboring gene effect’ provides additional evidence supporting transcriptional coupling of adjacent gene pairs on a genomic scale. A systematic screen of the yeast knock-out collection revealed individual gene deletions altered the regulated expression of the neighboring gene in about 10% of the cases .
The observation that considerable adjacent gene pairing can be recognized across a wide range of gene sets in divergent yeast species speaks to its evolutionary significance. Indeed there is a very high level of adjacent gene pairing in the RRB and RP regulons in distantly related yeast species (including as many as 24% of the RRB genes in C. albicans). Interestingly, while we can recognize distinct gene pairs in S. cerevisiae that have maintained their adjacency across many yeast species, and that the most closely related species tend to have a higher level of conservation of specific pairs, the more distantly related species have similar or greater overall levels of adjacent gene pairing, even if the exact pairs differ. Recently it has been reported that increased co-expression of neighboring gene pairs is retained even after their separation during evolution, and that newly formed gene pairs which arise from genomic rearrangements also tend to be co-expressed . These findings were true for divergent, tandem and convergent gene orientations, and one possibility is that local chromatin remodeling processes act on gene pairs in a way that is distinct from unpaired genes. Our analysis indicates that the phenomena of adjacent gene coregulation preceded the whole genome duplication event, and is widely conserved across yeast species from S. cerevisiae to S. pombe, even though the exact pairing relationships - that is which gene is paired with which - are not. Indeed, by using the highly conserved and therefore easily recognized RP gene set as a test case, we found evidence for significant adjacent gene pairing across a wide range of eukaryotes, including in most of the well studied and well annotated systems. We propose that, like in yeast, other eukaryotes will also exhibit significant adjacent gene pairing in gene sets beyond those related to ribosome biogenesis.
Functionally, we observed that within a given set of related genes, those members that were present as immediately adjacent pairs exhibited a tighter degree of transcriptional co-regulation than the genes that were located on their own across the genome. Interestingly, this observation was true when expression profiles were compared between immediately adjacent genes, as well as when one gene of an adjacent pair was compared to another member of a distinct, adjacent gene pair. Thus, within a set of related genes, for example the 282 members of the RRB regulon in S. cerevisiae, the subset of the 44 paired genes are the most tightly co-regulated members of the regulon, even though the gene pairs themselves were scattered across the various chromosomes.
Interestingly, we also observed that the cases in which the specific pairing of adjacent genes was most widely conserved across divergent yeast species corresponded to those genes that were the most highly expressed . Thus, there may be evolutionary pressure to favor adjacent gene pairing and concomitant transcriptional co-regulation for highly expressed and highly regulated genes. There may also be a connection between the extent of conservation of particular genomic arrangements, and the relative advantages of specific gene co-regulation in different ecological niches. For example, the observation that the specific gene pairings associated with the heat shock response are absent in C. albicans could be related to its relatively stable temperature environment as a human pathogen.
While further analysis of the cis and trans factors that mediate adjacent gene co-regulation will be required to elucidate how it is achieved, at least three, non-exclusive mechanisms can be proposed: 1) localized chromatin modification, 2) local DNA sequence looping, 3) co-localization of the genes to a common nuclear compartment. In terms of localized chromatin modifications, there is a correlation between genome-wide histone H3K14 acetylation and histone H4 acetylation domains that overlap with transcriptionally co-expressed genes in S. cerevisiae. In higher eukaryotes, the transcriptional activation of one gene can result in a localized chromatin ‘opening’ that ultimately creates a more transcriptionally permissive transcriptional state . This state can be propagated across significant distances, and it can affect the transcription of genes within a shared neighborhood . In terms of DNA looping, it has been observed that elements of the HMR-E locus can impart silencing onto an adjacent gene via a local looping of DNA sequences that brings the promoter of the adjacent gene into physical contact with the HMR-E silencing factors . By using the same chromosome conformation capture (3C) technique, genome-wide DNA looping interactions have been detected between genes on the same and different chromosomes in yeast  and, interestingly, co-regulated genes within similar ontologies were found to be preferentially associated with each other . Finally, it is possible that adjacent gene co-regulation may be mediated, in part, at the level of sub-nuclear compartmentalization. High resolution mapping of gene localizations in yeast revealed that transcriptionally active sets of genes, including those involved in ribosome biogenesis, occupied specific nuclear territories at the nucleolar periphery upon activation . In higher eukaryotes, active genes have been found to associate with discrete ‘transcription factories’, which are the site of nascent RNA production and are enriched for RNA pol II and associated transcription factors . Therefore, if one member of a gene pair became localized to an active sub-nuclear compartment, the adjacent gene could potentially fall under the same regulatory umbrella.
It appears that one of the ways that eukaryotic cells regulate the expression of genes within distinct regulons, or related pathways, is by distributing them, in part, as pairs of adjacent genes across the genome. The phenomena of adjacent gene co-regulation is widespread across eukaryotes, evolutionarily conserved, and functionally significant for maintaining coordinated levels of gene expression.
G offset was set to the expression levels prior to perturbation or to the average expressional state (the reference state) in each dataset. Microarray datasets were downloaded from the Gene Expression Omnibus and transcription was monitored across two independent heat shock time-courses [3, 28] (GEO accession numbers: GDS112 and GDS281), an osmotic shock time course  (GDS20), a timecourse following exposure to menadione  (GDS108), and a time-course following release from alpha factor synchronization  (GEO accession number: GDS38). The PCC scores for the unpaired genes represent the average of every possible pairing partner for every possible unpaired gene within the set. The PCC for the paired gene subset represents that average PCC score between each gene and every other paired gene, excluding that gene’s immediate adjacent neighbor. P-values were determined by bootstrapping with replacement, by taking at least 10,000 random groupings of genes (the same size as the paired subset) and determining the average PCC score for that grouping. The p-value was calculated from this distribution.
In order to determine the frequency of adjacent gene pairing within S. cerevisiae we selected a total of 28 sets of functionally related genes for analysis. These sets were defined previously by their gene ontology and downloaded from the Saccharomyces Genome Database (see Additional file 8: Table S2 and Additional file 5 for a complete list of accession numbers and the genes within each group). The rationale behind the sets of genes that were chosen was to pick a representative cross-section of those pathways that are involved in metabolism and responding to the environment (and, thus, in maintaining cellular homeostasis). The groupings were selected to represent a wide range of ontology sizes, from up to 282 genes in the RRB regulon to the 8 member purine biosynthesis pathway.
The pairing relationships for the Saccharomyces sensu strictu species (Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus) were determined based on synteny . The pairing relationships for Candida glabrata, Kluyveromyces lactis, Kluyveromyces waltii and Saccharomyces kluyveri were determined using the Yeast Gene Order Browser and are based on synteny . The pairing relationships for C. albicans, C. dubliniensis and S. pombe were determined based on homology [48, 49].
A bootstrap analysis was performed to determine the conservation of adjacent gene pairs throughout Saccharomyces sensu strictu species. Starting with all the pairs of adjacent genes (N-1, where N is equal to the number of genes within the genome) within the S. cerevisiae genome a set of S genes was chosen (where S was either size of 282 or of 180) and conservation of their genomic arraignment was determined by looking within the S. paradoxus, S. mikatae, or S. bayanus genomes . This analysis was run 10,000 times (with replacement after selection) for each set of genes and the percentage of paired genes is plotted against the frequency of occurrence.
Ribosomal proteins were defined as all genes whose products are considered structural components of the ribosome (including those that are cytosolic, chloroplastic, apicoplastic and mitochondrial). The rRNA and ribosome biogenesis regulon in S. cerevisiae was defined as described previously , consisting of 188 genes, and was expanded based on the gene ontology terms: ribosome biogenesis, rRNA processing, 90S pre-ribosome and small subunit (SSU) processome. Once the redundant genes were removed we had expanded the RRB family to a set of 282 genes (see Additional file 11).
The homologues were identified for the genes of the RRB regulon using the WU-BLAST algorithm to search for conservation of the protein coding sequences from the S. cerevisiae. The total number of genes used in the calculations included all verified protein coding genes from H. sapiens, D. melanogaster, C. elegans, A. thaliana, T. thermophila, P. falciparum, G. lamblia, N. crassa, A. nidulans, and N. gruberi. The genomic distributions of these gene sets were manually curated. There were several instances where an RRB gene that was identified by BLAST homology was adjacent to a gene with a characterized function in ribosome biogenesis (but it was not one of the RRB set homologs we initially identified), but we did not include these genes in our statistical analysis (a complete list of these genes is provided in Additional file 10).
and N is the total number of genes present within each species. The functional p-values were then calculated in Mathematica.
The authors would like to acknowledge the past and current members of the McAlear lab for their helpful discussions and suggestions throughout the course of this study and during the preparation of this manuscript.
Funding for this work was provided by the Department of Molecular Biology and Biochemistry at Wesleyan University in Middletown, CT.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.