Evolution of cis-regulatory elements in yeast de novo and duplicated new genes
© Tsai et al.; licensee BioMed Central Ltd. 2012
Received: 7 August 2012
Accepted: 18 December 2012
Published: 21 December 2012
New genes that originate from non-coding DNA rather than being duplicated from parent genes are called de novo genes. Their short evolution time and lack of parent genes provide a chance to study the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of these elements in de novo genes is lacking. Here, we conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes.
In a genome-wide investigation, we found that the number of transcription factor binding sites (TFBSs) in de novo genes of S. cerevisiae increased rapidly and quickly became comparable to the number of TFBSs in established genes. This phenomenon might have resulted from certain characteristics of de novo genes; namely, a relatively frequent gain of TFBSs, an unexpectedly high number of preexisting TFBSs, or lower selection pressure in the promoter regions of the de novo genes. Furthermore, we identified differences in the promoter architecture between de novo genes and duplicated new genes, suggesting that distinct regulatory strategies might be employed by genes of different origin. Finally, our functional analyses of the yeast de novo genes revealed that they might be related to reproduction.
Our observations showed that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, implying that these two types of genes might have different roles in evolution.
KeywordsDe novo gene Regulatory evolution TFBS turnover Promoter architecture
New genes arise through various mechanisms, including gene duplication, exon shuffling, gene fusion, retroposition, mobile elements, lateral gene transfer, and de novo origination [1–3]. Although new genes are considered to be fairly dispensable , their role in adaptive evolutionary innovation has been investigated. Most of the studies have focused on the cellular, physiological, morphological, behavioral, and reproductive phenotypic traits associated with new genes [1, 5–7]. A recent study found that 30% of the new genes in Drosophila quickly evolved essential functions that allowed them to participate in development . Using pre-existing genes as the raw material, duplicate genes rapidly developed essential functions that were not present in the pre-duplication gene through the processes of neofunctionalization  or subfunctionalization . In addition, neofunctionalization and subfunctionalization of transcription factor binding sites (TFBSs) can explain the novelty that occurs in the regulatory region of duplicated new genes [10–12]. The de novo origin of genes, genes that arise from previous nonfunctional genomic sequences, is a rare and intriguing process [13, 14]. It is believed that the new coding region could emerge by mutations that remove disruptions of a proto-open reading frames . Positive selection in the coding sequences has been reported, suggesting that adaptive protein evolution had occurred .
De novo gene evolution was first investigated in Drosophila melanogaster in 2006. Five novel genes were identified experimentally as derived from ancestral non-coding sequences and evolved as the result of a selection process associated with male reproduction . In Saccharomyces cerevisiae, the first identified de novo gene was BSC4. Population genetic analysis suggested that BSC4 was under strong negative selection at the nonsynonymous sites . A de novo transcript in Mus musculus was found to have emerged in an intergenic region because of indel mutations in the 5’ regulatory region; the transcript was fixed by a selective sweep in M. musculus populations . Other de novo genes have been identified in various species; for example, CLLU1 and FLJ33706 in Homo sapiens[18, 19], MDF1 in S. cerevisiae, DR10 in Oryza sativa, and Noble in D. melanogaster. In addition, several genome-wide analysis studies have identified numerous de novo genes in various species, and the importance of such genes in adaptive evolution has been discussed [23–28]. For example, in D. melanogaster, a study based on expressed sequence tags identified eleven putative de novo genes, and de novo origination was estimated to be responsible for 11.9% of the new genes . In H. sapiens, 60 protein-coding genes were identified as de novo genes that were highly expressed in the cerebral cortex . These findings indicate the importance of de novo genes in phenotypic diversity and evolutionary adaptation. Nevertheless, the regulatory evolution of de novo genes is not yet fully understood. A prevalent view is that de novo genes do not possess complicated regulatory control and, therefore, only a functional transcription start site would be required for transcription initiation . However, because de novo genes might play important roles in development , the view that only a simple regulatory control mechanism is used remains open to speculation.
Several genome-wide studies have attempted to describe the characteristics of regulatory evolution [30, 31]. Frequent gain or loss events of TFBSs (TFBS turnover) have been identified as an important feature of regulatory evolution, and have been found to exhibit lineage specificity in transcriptional regulation [32–34]. A previous study showed that duplicated new genes inherit more than a third of the regulatory interactions from their ancestral genes . Moreover, the expression of duplicated genes often benefits from the preexisting regulatory mechanism . After gene duplication, positive selection on cis-regulatory motifs leading to dramatically accelerated rates of cis-regulation compared with the orthologs has been observed . In S. cerevisiae, it has been shown that the number of shared TFBSs in duplicate genes decreased with evolution time whereas the total number remained unchanged, suggesting that there is a balance between gain in functionally novel TFBSs and either the loss of preexisting TFBSs or the modification of preexisting TFBSs to new functions . Nonetheless, de novo genes evolve from non-coding sequences based on the cryptic presence of functional sites, including a transcriptional start site and upstream regulatory elements . The question of how de novo genes that have no parent gene obtain regulatory elements and further establish complex regulatory mechanisms has yet to be determined.
We conducted a genome-wide investigation of de novo genes in S. cerevisiae to investigate regulatory evolution in the initial stages of gene emergence. One of the challenges is that the conventional methods that are used for de novo gene identification are known to overestimate their numbers because of the high number of false positives that are generated . Recently, Capra et al. developed a computational pipeline to identify de novo genes in yeast and to understand the evolution of protein interaction networks involving the novel genes . They identified 227 de novo genes that originated after whole-genome duplication (WGD), and found that initially the de novo genes had fewer interactions, but subsequently gained interactions more rapidly than duplicated new genes. Here, we modified their pipeline to identify S. cerevisiae-specific de novo genes that emerged after divergence from S. paradoxus, instead of after WGD. The stringent criteria that we used to identify de novo genes would aid our observation of cis-regulatory element evolution during the initial stage of a gene emergence. Using our modified method, we identified 34 de novo genes that were specific to S. cerevisiae (i.e., without either paralogous genes or orthologous genes in any other species). To analyze the cis-regulatory evolution of genes that had emerged from different origins and had different ages, we identified duplicated new genes (new genes with paralogous genes) and orthologous genes (well-conserved genes with orthologous genes in all seven yeast species) and compared the characteristics of cis-regulation in each. We found a higher number of TFBS gain events and higher evolution rates in the promoters of new genes (both de novo and duplicated new genes) compared with in old (orthologous) genes. Our findings suggested that the promoters of new genes might experience adaptive evolution as their functions become established. Furthermore, we investigated the nucleosome architecture in the promoter regions, which might be associated with transcriptional regulation and the evolution of eukaryotic genes [39–46]. Our results revealed significant lower occupancy of proximal nucleosomes and lower enrichment of the TATA box in promoters of de novo genes compared with in duplicated new and orthologous genes, suggesting that de novo genes might employ different regulatory strategies from duplicated genes. Finally, functional analyses revealed that de novo genes might play roles in reproduction-related functions.
Identification of de novo genes in S. cerevisiae
Identification of transcription factor binding sites
We retrieved 481 position frequency matrices from the MYBS database which integrates ChIP-chip data and phylogenetic footprinting data in yeast . To remove redundant motifs, we integrated all the recorded motifs for each transcription factor (TF) using the STAMP web server which calculates the similarity of various motifs and integrates them into a familial binding profile . A total of 175 familial binding profiles were generated and converted into position weight matrices (PWMs) by the PATSER software using the default settings . Putative TFBSs were obtained by scanning PWMs with a threshold p-value of <0.001  (TFBSs identified under different thresholds were also investigated to examine the robustness of our study in the Additional file 2: Supplementary Document). Next, putative TFBSs that were not documented in the curated YEASTRACT database, which documents 48,333 regulatory associations between TFs and their target genes , were excluded. We then characterized TFBSs based on whether they were newly gained (i.e., did not exist before gene origination) or were preexisting TFBSs (i.e., already existed before gene origination). The characterization entailed scanning the corresponding regions of S. paradoxus and S. mikatae, the two yeast species most closely related to S. cerevisiae, for each of the TFBSs that were identified in S. cerevisiae. The corresponding regions, defined as the regions that extended 25 bp upstream and downstream of the aligned region of a TFBS , were retrieved from multiz7way . A TFBS gain event was defined as a TFBS in S. cerevisiae that did not possess an occurrence of its motif within the corresponding regions in S. paradoxus and S. mikatae. A preexisting TFBS was defined as possessing occurrences of its motif within the corresponding regions in S. paradoxus, S. mikatae, or both. TFBS losses of de novo genes were not investigated because no ancient gene exists; that is, no functional TFBS existed before the de novo gene emerged.
Investigation of nucleosome occupancy and promoter architecture
In this study, we used a S. cerevisiae genome-wide reference map of nucleosome positions that integrated six high-resolution genome-wide maps from multiple laboratories and detection platforms . To exclude relatively depleted nucleosomes, only nucleosomes with >50% occupancy were considered . Tirosh et al. defined two gene categories according to different characteristics of the promoter nucleosomes and found that the two categories possessed different regulatory strategies . We modified the procedure proposed by Tirosh et al., and identified two categories according to the presence of nucleosomes in the TSS-proximal region (from TSS up to −100) and the TSS-distal region (from −300 to −400), as follows: (a) genes with a nucleosome in the TSS-proximal region but with none in the TSS-distal region, referred to as occupied proximal nucleosome (OPN) genes; and (b) genes without a nucleosome in the TSS-proximal region but with one in the TSS-distal region, referred to as depleted proximal nucleosome (DPN) genes.
The Serial Pattern of Expression Levels Locator (SPELL) database  was used to identify the potential functions of the S. cerevisiae de novo genes. SPELL is a query-driven search engine for large gene expression microarray compendia containing more than 2,400 experimental conditions. It has been used to identify the most informative expression data sets and to interpret relevant genes for a given set of query genes. We queried the SPELL database using the de novo genes and identified the top 100 relevant genes that were most similarly expressed across all data sets. SPELL then assigned the Gene Ontology (GO) terms from the identified genes to the queried de novo genes. Significance was tested using the Bonferroni-corrected Fisher’s exact test with the q-value set to <0.01 . We also conducted TFBS enrichment analysis to identify TFs that might be responsible for the regulation of the de novo genes. The identification was based on a binomial test, in which the null hypothesis states that the probability of finding the TFBSs in de novo genes is smaller or equal to that of all the other genes in the S. cerevisiae genome.
Evolutionary characteristics of TFBSs in de novo genes
Lower selection pressure in promoter regions of de novo genes
Nucleosome occupancy and TATA box in promoters of new genes
Another crucial architectural motif in the promoter is the TATA box. The expression of TATA-containing genes is highly regulated, responsive to stress, sensitive to chromatin regulators, and variable across different species [62, 63]. We found that the proportion of TATA-containing genes (consensus TATA(A/T)A(A/T)(A/G) within −50 to −200 ) was significantly lower in de novo genes (12.1%) compared with the proportion observed in the whole S. cerevisiae genome (23.3%) (one-sided two-sample proportion test p = 0.0037). In contrast, the proportion of TATA-containing genes in the duplicated new genes (71.4%) and orthologous genes (35.9%) was significantly higher than in the whole S. cerevisiae genome (one-sided two-sample proportion test p = 3.5×10-7 and 1.1×10-9, respectively) (Figure 5B). Overall, our findings indicated that de novo genes were dominated by DPN genes but fewer TATA-containing genes, whereas duplicated new genes were dominated by OPN genes and TATA-containing genes. These results suggested that the two types of new genes may possess different regulatory strategies.
Functional analyses of de novo genes
Predicted GO terms for 34 de novo genes by SPELL
sexual sporulation resulting in formation of a cellular spore
cellular process involved in reproduction
spore wall biogenesis
ascospore wall biogenesis
ascospore wall assembly
spore wall assembly
fungal-type cell wall assembly
cell wall assembly
sporulation resulting in formation of a cellular spore
Predicted GO terms for 56 de novo genes (including 22 de novo genes with short promoters or poor alignments in promoters) by SPELL
M phase of meiotic cell cycle
meiotic cell cycle
cellular process involved in reproduction
We investigated the emergence of cis-regulatory elements in de novo genes. Specifically, 56 de novo genes were identified as having emerged in S. cerevisiae since separation from S. paradoxus approximately 5 million years ago . It has been shown that different approaches for de novo gene identification may yield different results. For example, Capra et al. investigated all the de novo genes since WGD. This strategy ensured that the possibility of having orthologous genes in any species before WGD was avoided, but genes in the closely related species after WGD were allowed . Wu et al., on the other hand, considered only the de novo genes without any orthologous genes but with highly similar orthologous regions and frame-shifts in two closely related species . In short, Capra et al. discuss the evolution of de novo genes in a relative large time-scale while Wu et al. analyzed the characteristics of de novo genes that originated immediately by one-step mutations from closely related species. In this study, we attempted to understand the evolution of regulatory elements which requires sufficient evolution time to accumulate mutations. Therefore, we considered a time-scale that fell between the time-scales of the above two studies. We did not focus on the de novo genes that immediately emerged one-step away from non-coding regions as in Wu et al., because the promoters of these genes might not have experienced sufficient evolution time.
Our results showed that the promoters of new genes (of both de novo and duplicated origin) possessed similar numbers of regulatory TFs and TFBSs compared with those in orthologous genes. This finding suggested that TFBSs might be established rapidly after the emergence of a new gene and could be explained by the frequent occurrence of TFBS turnover, a well-documented phenomenon in eukaryote cis-regulation . For example, frequent TFBS gain events in duplicated genes were found to play a critical role in the regulatory evolution of the yeast genome . Papp et al. found that the numbers of TFBSs in the promoters of duplicated genes remained constant over evolutionary time, whereas the numbers of shared motifs from a preexisting gene decreased, perhaps because of a balance between the gain of new TFBSs and the loss of TFBSs from parent genes .
The promoters of de novo genes that evolved from non-coding regions instead of duplicated from promoters of parent genes might be expected to have a different frequency of TFBS gain event than in duplicated genes. However, our analyses showed that the de novo and duplicated new genes exhibited similar numbers of TFBS gain events. A simple explanation could be that preexisting TFBSs in the promoters of the de novo genes were more plentiful than previous expected. Indeed, our results indicated that more than half of the TFBSs in the promoters of de novo genes were preexisting TFBSs, which supports this explanation. Together with the observation of high substitution rates in the promoters of de novo genes, our results further suggested that the promoters experienced adaptation evolution and frequent gain events. Both these phenomena would rapidly increase the number of TFBSs in de novo genes to a level comparable with the number found in orthologous genes. In addition, the higher substitution rates in the promoters of de novo genes compared with those of neutral sequences (i.e. the four-fold degenerate sites) suggested that the new genes might experience positive selection during the establishment of cis-regulatory motifs. Our results agree with a previous protein interaction networks study which found that, although de novo genes initially had fewer functions and protein interactions than duplicated new genes, de novo genes rapidly gained functions and protein interactions until the numbers were comparable to duplicated new genes .
Research has shown that duplicated genes often inherit cis-regulatory elements from their parent genes, thereby benefiting from preexisting regulatory mechanisms [35, 36]. However, because we found that de novo genes had a similar proportion of preexisting TFBSs in their promoters as duplicated new genes, we have proposed three possible explanations for this observation. First, studies have shown that non-functional TFBSs reside throughout the intergenic regions in the genome; for example, it was reported that TFs can bind to substantial numbers of non-functional TFBSs regardless of their weak binding strength . Second, although we removed head-to-head genes that share core promoters, there still might be cases in which the promoters are shared. The promoter of the de novo genes may partially overlap with the distal promoter of neighboring genes, especially in yeast, which have relatively short intergenic region. Moreover, while non-functional TFBSs determined by documented regulatory associations in YEASTRACT have been removed (i.e. the pair of head-to-head genes would not have exactly the same set of TFBSs), some TFBSs may still be shared. These shared TFBSs could explain the unexpectedly high proportion of preexisting TFBSs in de novo genes. Third, there may be a number of false positives in the computational identification of the TFBSs . Although we filtered out non-functional TFBSs in S. cerevisiae according to the regulatory associations documented in the YEASTRACT database , similar information in the other yeast species is insufficient to eliminate all the potential false positives. Thus, the numbers of TFBSs in other yeast species and consequently the number of preexisting TFBSs might have been overestimated.
The promoter architecture of new genes is an intriguing issue to explore because it has been associated with the gene origination mechanisms . We found that duplicated new genes were enriched with OPN genes and TATA-containing genes; whereas, most de novo genes were TATA-less and enriched with DPN genes. The association between DPN and TATA-less promoters in de novo genes is consistent with the report that TATA-less promoters usually have clearer nucleosome free regions than TATA-containing genes [45, 72]. Additionally, TATA box and OPN enrichment has been reported in the promoters of duplicated genes [44, 73]. OPN and TATA-containing genes are relatively adaptable to environmental changes and are associated with processes that require high expression variation, such as transcriptional plasticity, sensitivity to chromatin regulation and genetic perturbations, expression noise, and expression divergence. In addition, TATA-containing genes are often highly regulated and are associated with inducible responses to stress or biotic stimuli [45, 62, 63, 74]. DPN and TATA-less genes, on the other hand, display relatively low expression variation and constitutive expression, and TATA-less genes are lightly regulated by chromatin regulators, unresponsive to stress, and related to basic housekeeping functions in yeast and human [62, 63, 75]. The functions of TATA-less genes are enriched in basic processes such as cell growth and maintenance, protein biosynthesis, large ribosomal subunit, and mitochondrion , and these known functions are consistent with the results of our functional analyses of de novo genes. Furthermore, the promoters of the TATA-containing genes are TAF-independent and dominated by the Spt-Ada-Gcn5 acetyltransferase complex (SAGA), while the promoters of the TATA-less genes are TFIID-dominated and highly TAF-dependent despite there being a common set of TAFs that are shared by SAGA and TFIID . As a result, the difference in TATA enrichment and nucleosome occupancy (OPN or DPN) between the two types of new genes indicates that they employ distinct regulatory mechanisms. These findings agree with the suggestions by Capra et al. that the function and fate of new genes are associated with their origins . Our functional analysis using SPELL suggested that de novo genes might contribute to cellular processes that are involved in reproduction, such as sporulation and formations of cellular spore and cell wall. Differences in sporulation patterns and sporulation efficiencies between S. cerevisiae and S. paradoxus have been observed . Also, germinating spores of S. cerevisiae show a higher preference for own-species mating than the spores of S. paradoxus. In addition, the enrichment of DPN genes and TATA-less genes that we found in the de novo genes agrees with the observation that the genes involved in sporulation and division are constitutively expressed .
We used SPELL to predict the functions of de novo genes because of the lack of functional annotations in de novo genes. However, SPELL has various limitations. Given a set of query genes, SPELL identifies the expression microarray datasets that are most informative for these genes. Then additional genes that have the most similar expression profiles to the query genes are identified in the datasets. According to the functions of the additional genes, SPELL generates hypothetical functions for the query genes. However, the assignment of the functions is for the most part limited to the microarray datasets and GO annotation. Moreover, correlations of the expression patterns among a set of co-functional genes might not always be significantly high, because the genes need not be co-expressed at all the experimental time points. Because of these limitations, the functions assigned by SPELL may reveal only partial, and sometimes inaccurate, roles of de novo genes.
In addition to the SPELL functional predictions, we provided further support for the predicted de novo gene function by examining the function of their TFs. We identified BAS1, GCN4 and GCR1 as regulators of de novo genes. Interestingly, studies suggests that all three of these TFs are related to meiotic recombination, a process in reproduction: mutations in BAS1 affect the frequency of aberrant segregation of recombination hotspot at the histone HIS4 locus, lessen the recombination distance, and alter the frequency of meiosis-specific double-strand DNA breaks [65, 66]; deletion or constitutive expression of GCN4 affects the frequency of gene conversion and crossing-over at the HIS4 locus ; and removal of GCR1-binding sites reduces the expression of REC102, a gene required for the initiation of meiotic recombination . Based on previous studies and the findings in this study, we propose that de novo genes may play an important role in reproduction.
Although the functions of most de novo genes have not been well investigated, some of their specific roles have been addressed [1, 2, 27]. For example, Wu et al. have analyzed the transcriptome of numerous human tissues and found that de novo genes are highly expressed in the testes and cerebral cortex, which plays key roles in cognitive abilities . The authors suggested that the de novo genes might contribute to phenotypic traits that are unique to humans . Our results also suggest that new genes from different origins may play distinct roles in the evolutionary process. While duplicated new genes have been shown to be involved in environmental adaptation , we hypothesized that de novo genes might contribute to evolutionary innovation in reproduction processes like sporulation efficiency. Further studies are required to examine this hypothesis; nevertheless, the computational approaches that were used in this study shed some light on the evolution of cis-regulation in de novo genes.
Our study showed that the number of TFBSs in de novo genes increased rapidly after gene emergence and soon resulted in that de novo genes having a comparable number of TFBSs as the orthologous genes. We suggested that frequent TFBS gain events, more numbers of unexpected preexisting TFBSs, and the lower selection pressure experienced in the promoters of de novo genes compared to orthologous genes could be the major reasons for this finding. Moreover, we found that new genes from different origins (de novo or duplication) have distinct regulatory characteristics (de novo genes were dominated by DPN and TATA-less genes; duplicated new genes were dominated by OPN and TATA-containing genes). Furthermore, we found that the predicted GO terms related to reproduction processes were enriched in de novo genes. Taking all of our results together, we concluded that de novo genes and duplicated new genes might play distinct roles in evolution.
This work was supported by the National Science Council of Taiwan (Grant No: 99-2621-B-001-005-MY2 and NSC100-2628-E-001-006-MY3).
- Kaessmann H: Origins, evolution, and phenotypic impact of new genes. Genome Res. 2010, 20: 1313-1326. 10.1101/gr.101386.109.PubMed CentralView ArticlePubMed
- Long M, Betrán E, Thornton K, Wang W: The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003, 4: 865-875.View ArticlePubMed
- Cai J, Zhao R, Jiang H, Wang W: De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008, 179: 487-496. 10.1534/genetics.107.084491.PubMed CentralView ArticlePubMed
- Krylov DM, Wolf YI, Rogozin IB, Koonin EV: Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 2003, 13: 2229-2235. 10.1101/gr.1589103.PubMed CentralView ArticlePubMed
- Conant GC, Wolfe KH: Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008, 9: 938-950. 10.1038/nrg2482.View ArticlePubMed
- Demuth JP, Hahn MW: The life and death of gene families. BioEssays. 2009, 31: 29-39. 10.1002/bies.080085.View ArticlePubMed
- Emes RD: Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum Mol Genet. 2003, 12: 701-709. 10.1093/hmg/ddg078.View ArticlePubMed
- Chen S, Zhang YE, Long M: New genes in Drosophila quickly become essential. Science. 2010, 330: 1682-1685. 10.1126/science.1196380.View ArticlePubMed
- Lynch M, O’Hely M, Walsh B, Force A: The probability of preservation of a newly arisen gene duplicate. Genetics. 2001, 159: 1789-1804.PubMed CentralPubMed
- Wapinski I, Pfiffner J, French C, Socha A, Thompson DA, Regev A: Gene duplication and the evolution of ribosomal protein gene regulation in yeast. Proc Natl Acad Sci USA. 2010, 107: 5505-5510. 10.1073/pnas.0911905107.PubMed CentralView ArticlePubMed
- MacCarthy T, Bergman A: The limits of subfunctionalization. BMC Evol Biol. 2007, 7: 213-10.1186/1471-2148-7-213.PubMed CentralView ArticlePubMed
- Papp B, Pal C, Hurst LD: Evolution of cis-regulatory elements in duplicated genes of yeast. Trends Genet. 2003, 19: 417-422. 10.1016/S0168-9525(03)00174-4.View ArticlePubMed
- Ohno S: Evolution by gene duplication. 1970, Springer-Verlag, New YorkView Article
- Jacob F: Evolution and tinkering. Science. 1977, 196: 1161-1166. 10.1126/science.860134.View ArticlePubMed
- Johnson ME, Viggiano L, Abdul-Rauf M, Goodwin G, Rocchi M, Eichler EE, Bailey J a: Positive selection of a gene family during the emergence of humans and African apes. Nature. 2001, 413: 514-519. 10.1038/35097067.View ArticlePubMed
- Levine MT, Jones CD, Kern AD, Begun DJ, Lindfors H a: Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA. 2006, 103: 9935-9939. 10.1073/pnas.0509809103.PubMed CentralView ArticlePubMed
- Staubach F, Häming D, Tautz D, Heinen TJ a J: Emergence of a new gene from an intergenic region. Curr Biol. 2009, 19: 1527-1531. 10.1016/j.cub.2009.07.049.View ArticlePubMed
- Knowles DG, McLysaght A: Recent de novo origin of human protein-coding genes. Genome Res. 2009, 19: 1752-1759. 10.1101/gr.095026.109.PubMed CentralView ArticlePubMed
- Li C-Y, Zhang Y, Wang Z, Zhang Y, Cao C, Zhang P-W, Lu S-J, Li X-M, Yu Q, Zheng X, Du Q, Uhl GR, Liu Q-R, Wei L: A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol. 2010, 6: e1000734-10.1371/journal.pcbi.1000734.PubMed CentralView ArticlePubMed
- Li D, Dong Y, Jiang Y, Jiang H, Cai J, Wang W: A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand. Cell Res. 2010, 20: 408-420. 10.1038/cr.2010.31.View ArticlePubMed
- Xiao W, Liu H, Li Y, Li X, Xu C, Long M, Wang S: A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS One. 2009, 4: e4603-10.1371/journal.pone.0004603.PubMed CentralView ArticlePubMed
- Gontijo AM, Miguela V, Whiting MF, Woodruff RC, Dominguez M: Intron retention in the Drosophila melanogaster Rieske Iron Sulphur Protein gene generated a new protein. Nat Commun. 2011, 2: 323-PubMed CentralView ArticlePubMed
- Begun DJ, Kern AD, Jones CD, Lindfors H a: Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics. 2007, 176: 1131-1137.PubMed CentralView ArticlePubMed
- Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, Li X, Ding Y, Yang S, Wang W: On the origin of new genes in Drosophila. Genome Res. 2008, 18: 1446-1455. 10.1101/gr.076588.108.PubMed CentralView ArticlePubMed
- Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Albà MM: Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol. 2009, 26: 603-612.View ArticlePubMed
- Ekman D, Elofsson A: Identifying and quantifying orphan protein sequences in fungi. J Mol Biol. 2010, 396: 396-405. 10.1016/j.jmb.2009.11.053.View ArticlePubMed
- Wu D-D, Irwin DM, Zhang Y-P: De Novo Origin of Human Protein-Coding Genes. PLoS Genet. 2011, 7: e1002379-10.1371/journal.pgen.1002379.PubMed CentralView ArticlePubMed
- Yang Z, Huang J: De novo origin of new genes with introns in Plasmodium vivax. FEBS Lett. 2011, 585: 641-644. 10.1016/j.febslet.2011.01.017.View ArticlePubMed
- Tautz D, Domazet-Lošo T: The evolutionary origin of orphan genes. Nat Rev Genet. 2011, 12: 692-702.View ArticlePubMed
- Dermitzakis ET, Clark AG: Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002, 19: 1114-1121. 10.1093/oxfordjournals.molbev.a004169.View ArticlePubMed
- Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P, Odom DT: Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010, 328: 1036-1040. 10.1126/science.1186176.PubMed CentralView ArticlePubMed
- Doniger SW, Fay JC: Frequent gain and loss of functional transcription factor binding sites. PLoS Comput Biol. 2007, 3: e99-10.1371/journal.pcbi.0030099.PubMed CentralView ArticlePubMed
- Venkataram S, Fay JC: Is transcription factor binding site turnover a sufficient explanation for cis-regulatory sequence divergence?. Genome Biol Evol. 2010, 2: 851-858. 10.1093/gbe/evq066.PubMed CentralView ArticlePubMed
- He BZ, Holloway AK, Maerkl SJ, Kreitman M: Does positive selection drive transcription factor binding site turnover? A test with Drosophila cis-regulatory modules. PLoS Genet. 2011, 7: e1002053-10.1371/journal.pgen.1002053.PubMed CentralView ArticlePubMed
- Babu MM, Teichmann S a: Gene regulatory network growth by duplication. Nat Genet. 2004, 36: 492-496. 10.1038/ng1340.View ArticlePubMed
- Kaessmann H, Vinckenbosch N, Long M: RNA-based gene duplication: mechanistic and evolutionary insights. Nat Rev Genet. 2009, 10: 19-31.PubMed CentralView ArticlePubMed
- Castillo-Davis CI, Hartl DL, Achaz G: cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Res. 2004, 14: 1530-1536. 10.1101/gr.2662504.PubMed CentralView ArticlePubMed
- Pollard KS, Singh M, Capra J a: Novel genes exhibit distinct patterns of function acquisition and network integration. Genome Biol. 2010, 11: R127-10.1186/gb-2010-11-12-r127.PubMed CentralView ArticlePubMed
- Swamy KBS, Chu W-Y, Wang C-Y, Tsai H-K, Wang D: Evidence of association between Nucleosome Occupancy and the Evolution of Transcription Factor Binding Sites in Yeast. BMC Evol Biol. 2011, 11: 150-10.1186/1471-2148-11-150.PubMed CentralView ArticlePubMed
- Zaugg JB, Luscombe NM: A genomic model of condition-specific nucleosome behavior explains transcriptional activity in yeast. Genome Res. 2012, 22: 84-94. 10.1101/gr.124099.111.PubMed CentralView ArticlePubMed
- Tsankov AM, Thompson DA, Socha A, Regev A, Rando OJ: The role of nucleosome positioning in the evolution of gene regulation. PLoS Biol. 2010, 8: e1000414-10.1371/journal.pbio.1000414.PubMed CentralView ArticlePubMed
- Warnecke T, Batada NN, Hurst LD: The impact of the nucleosome code on protein-coding sequence evolution in yeast. PLoS Genet. 2008, 4: e1000250-10.1371/journal.pgen.1000250.PubMed CentralView ArticlePubMed
- Washietl S, Machné R, Goldman N: Evolutionary footprints of nucleosome positions in yeast. Trends Genet. 2008, 24: 583-587. 10.1016/j.tig.2008.09.003.View ArticlePubMed
- Kim Y, Lee JH, Babbitt G a: The enrichment of TATA box and the scarcity of depleted proximal nucleosome in the promoters of duplicated yeast genes. J Mol Evol. 2010, 70: 69-73. 10.1007/s00239-009-9309-3.View ArticlePubMed
- Tirosh I, Barkai N: Two strategies for gene regulation by promoter nucleosomes. Genome Res. 2008, 18: 1084-1091. 10.1101/gr.076059.108.PubMed CentralView ArticlePubMed
- Field Y, Fondufe-Mittendorf Y, Moore IK, Mieczkowski P, Kaplan N, Lubling Y, Lieb JD, Widom J, Segal E: Gene expression divergence in yeast is coupled to evolution of DNA-encoded nucleosome organization. Nat Genet. 2009, 41: 438-445. 10.1038/ng.324.PubMed CentralView ArticlePubMed
- Cherry JM, Ball C, Weng S, Juvik G, Schmidt R, Dunn B, Dwight S, Riles L, Mortimer RK: Genetic and physical maps of Saccharomyces cerevisiae. Nature. 1997, 387: 67-73. 10.1038/387067a0.PubMed CentralView ArticlePubMed
- Uniprot T: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, 40: D71-D75.View Article
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.PubMed CentralView ArticlePubMed
- Gordon JL, Armisén D, Proux-Wéra E, Óhéigeartaigh SS, Byrne KP, Wolfe KH: Evolutionary erosion of yeast sex chromosomes by mating-type switching accidents. Proc Natl Acad Sci USA. 2011, 108: 20024-20029. 10.1073/pnas.1112808108.PubMed CentralView ArticlePubMed
- Chen F, Mackey AJ, Stoeckert CJ, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006, 34: D363-D368. 10.1093/nar/gkj123.PubMed CentralView ArticlePubMed
- Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ, Fujita P a Harte R a: The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011, 39: D876-D882. 10.1093/nar/gkq963.PubMed CentralView ArticlePubMed
- Tsai H-K, Chou M-Y, Shih C-H, Huang GT-W, Chang T-H, Li W-H: MYBS: a comprehensive web server for mining transcription factor binding sites in yeast. Nucleic Acids Res. 2007, 35: W221-W226. 10.1093/nar/gkm379.PubMed CentralView ArticlePubMed
- Mahony S, Benos PV: STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 2007, 35: W253-W258. 10.1093/nar/gkm272.PubMed CentralView ArticlePubMed
- Thomas-Chollier M, Defrance M, Sand O, Herrmann C, Thieffry D, van Helden J, Medina-Rivera a: RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 2011, 39: W86-W91. 10.1093/nar/gkr377.PubMed CentralView ArticlePubMed
- Turatsinze J-V, Thomas-Chollier M, Defrance M, van Helden J: Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc. 2008, 3: 1578-1588. 10.1038/nprot.2008.97.View ArticlePubMed
- Abdulrehman D, Monteiro PT, Teixeira MC, Mira NP, Lourenço AB, dos Santos SC, Cabrito TR, Francisco AP, Madeira SC, Aires RS, Oliveira AL, Sá-Correia I, Freitas AT: YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 2011, 39: D136-D140. 10.1093/nar/gkq964.PubMed CentralView ArticlePubMed
- Jiang C, Pugh BF: A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome. Genome Biol. 2009, 10: R109-10.1186/gb-2009-10-10-r109.PubMed CentralView ArticlePubMed
- Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG, Hibbs M a: Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics. 2007, 23: 2692-2699. 10.1093/bioinformatics/btm403.View ArticlePubMed
- Li Y-D, Liang H, Gu Z, Lin Z, Guan W, Zhou L, Li Y-Q, Li W-H: Detecting positive selection in the budding yeast genome. J Evol Biol. 2009, 22: 2430-2437. 10.1111/j.1420-9101.2009.01851.x.View ArticlePubMed
- Tirosh I, Barkai N: Inferring regulatory mechanisms from patterns of evolutionary divergence. Mol Syst Biol. 2011, 7: 1-10.
- Tirosh I, Weinberger A, Carmi M, Barkai N: A genetic signature of interspecies variations in gene expression. Nat Genet. 2006, 38: 830-834. 10.1038/ng1819.View ArticlePubMed
- Basehoar AD, Zanton SJ, Pugh BF: Identification and distinct regulation of yeast TATA box-containing genes. Cell. 2004, 116: 699-709. 10.1016/S0092-8674(04)00205-3.View ArticlePubMed
- Robinson MD, Grigull J, Mohammad N, Hughes TR: FunSpec: a web-based cluster interpreter for yeast. BMC Bioinforma. 2002, 3: 35-10.1186/1471-2105-3-35.View Article
- Dominska M, Petes TD, White M a: Transcription factors are required for the meiotic recombination hotspot at the HIS4 locus in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 1993, 90: 6621-6625. 10.1073/pnas.90.14.6621.PubMed CentralView ArticlePubMed
- Mieczkowski PA, Dominska M, Buck MJ, Gerton JL, Lieb JD, Petes TD: Global Analysis of the Relationship between the Binding of the Bas1p Transcription Factor and Meiosis-Specific Double-Strand DNA Breaks in Saccharomyces cerevisiae. Mol Cell Biol. 2006, 26: 1014-1027. 10.1128/MCB.26.3.1014-1027.2006.PubMed CentralView ArticlePubMed
- Abdullah MF, Borts RH: Meiotic recombination frequencies are affected by nutritional states in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2001, 98: 14524-14529. 10.1073/pnas.201529598.PubMed CentralView ArticlePubMed
- Jiao K, Nau JJ, Cool M, Gray WM, Fassler JS, Malone RE: Phylogenetic footprinting reveals multiple regulatory elements involved in control of the meiotic recombination gene, REC102. Yeast. 2002, 19: 99-114. 10.1002/yea.800.View ArticlePubMed
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644.View ArticlePubMed
- Chan ET, Berger MF, Stottmann R, Hughes TR, Bulyk ML, Jaeger S a: Conservation and regulatory associations of a wide affinity range of mouse transcription factor binding sites. Genomics. 2010, 95: 185-195. 10.1016/j.ygeno.2010.01.002.PubMed CentralView ArticlePubMed
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004, 5: 276-287. 10.1038/nrg1315.View ArticlePubMed
- Erb I, van Nimwegen E: Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PLoS One. 2011, 6: e24279-10.1371/journal.pone.0024279.PubMed CentralView ArticlePubMed
- Zou Y, Huang W, Gu Z, Gu X: Predominant gain of promoter TATA box after gene duplication associated with stress responses. Mol Biol Evol. 2011, 28: 2893-2904. 10.1093/molbev/msr116.View ArticlePubMed
- Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, Lubling Y, Widom J, Segal E: Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS Comput Biol. 2008, 4: e1000216-10.1371/journal.pcbi.1000216.PubMed CentralView ArticlePubMed
- Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E: Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. Gene. 2007, 389: 52-65. 10.1016/j.gene.2006.09.029.PubMed CentralView ArticlePubMed
- Huisinga KL, Pugh BF: A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in Saccharomyces cerevisiae. Mol Cell. 2004, 13: 573-585. 10.1016/S1097-2765(04)00087-5.View ArticlePubMed
- Piccirillo S, Honigberg SM: Sporulation patterning and invasive growth in wild and domesticated yeast colonies. Res Microbiol. 2010, 161: 390-398. 10.1016/j.resmic.2010.04.001.PubMed CentralView ArticlePubMed
- Maclean CJ, Greig D: Prezygotic reproductive isolation between Saccharomyces cerevisiae and Saccharomyces paradoxus. BMC Evol Biol. 2008, 8: 1-10.1186/1471-2148-8-1.PubMed CentralView ArticlePubMed
- Ng YK, Hewavitharana AK, Webb R, Shaw PN, Fuerst J a: Developmental cycle and pharmaceutically relevant compounds of Salinispora actinobacteria isolated from Great Barrier Reef marine sponges. Appl Microbiol Biotechnol. 2012, E-pub ahead of print
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.