Gene expression and fractionation resistance
© Chen and Sankoff; licensee BioMed Central Ltd. 2014
Published: 17 October 2014
Previous work on whole genome doubling in plants established the importance of gene functional category in provoking or suppressing duplicate gene loss, or fractionation. Other studies, particularly in Paramecium have correlated levels of gene expression with vulnerability or resistance to duplicate loss.
Here we analyze the simultaneous effect of function category and expression in two plant data sets, rosids and asterids.
We demonstrate function category and expression level have independent effects, though expression does not play the dominant role it does in Paramecium.
Whole genome doubling (WGD) is a special case of gene duplication in that everything in the genome, including the genes, regulatory elements, and repetitive regions, is doubled or tripled. This process is more common in plant lineages than in other evolutionary domains [1, 2] and is an important source of gene innovations, contributing to diverse morphological and functional complexities in modern plants [3, 4]. The duplicated genes are very vulnerable to loss after the WGD event via excision of chromosomal segments or pseudogenization. These losses are collectively referred as fractionation. Various models have been proposed to explain the details of this process, such as the Gene Dosage Hypothesis [5, 6] and the Gene Balance Hypothesis . These models try to explain the difference in duplicate gene retention pattern based on the traditional models of gene fate: neofunctionalization, subfunctionalization, and pseudogenization, and on the observations on duplicate gene retentions from WGD.
We have shown in several groups of plants - rosids, asterids, and monocots that the functional category of a gene is a major determinant of fractionation resistance, with metabolic genes being fractionation prone, and "response to stimulus" being fractionation resistant [8, 9].
A recurrent theme in works relating to fractionation is the effect of gene expression. In a comprehensive study of fractionation in Paramecium, Gout et al.  identify a clear relationship between high WGD duplicate gene retention rates and high expression level. They also find that within each major gene functional class, higher expression correlates with higher duplicate retention rates; even if the expression levels of each major functional classes differ from each other. They conclude that expression level is the best discriminator for explaining variable resistance to fractionation.
The Gout et al. paper  is the primary inspiration for this study, where we explore the relationship between of functional class and expression in fractionation resistance in plants. Because we have previously shown that functional class can itself influence the fractionation resistance of the duplicates [8, 9] we wish to consolidate these two kinds of findings into one unified framework.
Due to the still scarce availability of high-quality expression data, we make use of RNA-seq data from grape  to represent expression in the rosids and RNA-seq data from tomato  to represent expression in the asterids. Tomato gene expression values are about three to four times higher than grape values because of different technology platforms and depth of sequencing. This prevents the meaningful comparisons of absolute gene expression values between grape and tomato. Normalized comparisons, however, are valid.
We are interested in comparing functional categories, expression levels and fractionation among thousands of genes but the inclusion of a few extremely highly expressed genes could swamp some of these comparisons. Thus we filter out the genes in the top 1% of expression levels. The filter is most pertinent to the more specific GO categories such as individual enzyme classes where the number of genes in each class may be small. Filtering is not necessary for the top level categories since they contain thousands of genes, but for consistency we keep the filter for all the expression analysis.
To take into account varying plant tissues having different expression profiles as well as plant responses to different environments and stimuli, we use the highest reported expression value for any given gene rather than the median or the mean. The rationale for this is that the RNA-seq data we are using distinguishes different expression level in different tissues as opposed to responses to a particular stimuli. Many genes are only expressed in specific tissues, so the maximum expression level of a gene is a better indication of its importance in the organism.
Where the sum is taken over all category C' in the top level domain including C. As we plot P (F, C) against F, we will deem C to be fractionation resistant if P (F, C) increases with increasing F. We deem C to be fractionation prone if the reverse case is observed, where P (F, C) deceases with increase in F.
However, the change in expression levels of sets from the least fractionation resistant to the most fractionation resistant is smaller in our rosids and asterids data sets than in . Functional classes with lower number of genes ("nucleic acid binding transcription factor activity" has 411 genes in grape and 552 genes in tomato) are still suggestive of the trend but are no longer statistically significant in both data sets (p >0.05). These differences may be due to sample size rather than the differences between the protist Paramecium and plants.
On the other hand, functional class appears to have greater influence than expression on determining whether a homology set is fractionation resistant or not. Both GO terms "metabolic process" and "catalytic activity", which are reported to be very fractionation prone, have similar expression levels to "response to stimulus", a very fractionation resistant GO term. The GO term "biological regulation", one of the most fractionation resistant terms, has on average lower expression levels than any of the above-mentioned terms in both of the rosids and the asterids datasets.
Discussion and conclusion
How can we reconcile the relatively small effect of gene expression on fractionation resistance with the claim that gene expression levels are fundamental to copy number variations and fractionation resistance [5, 6, 10, 25, 27].
One of the more plausible explanations is rather than just the fitness cost of gene expression controlling fractionation resistance, the fitness cost of disruption of the intended function of the gene or the gene network is a greater contributing factor. Highly connected genes have been reported to be preferentially retained [28, 29] and are predicted to be more retained by both the Gene Balance Hypothesis  and the Gene Dosage Hypothesis [5, 6]. As such, genes in a functional class that generally has low expression levels may still have high fractionation resistance level due to the importance of the function or the functional network.
It should be noted that many other factors have been proposed to explain variable fractionation rates. Moghe et al. showed that gene sequence features such as longer amino acid length and higher GC3 level (the wobble position in protein translation), contribute to fractionation resistance in Raphanus raphanistrum, Arabidopsis thaliana, Arabidopsis lyrata, and Brassica rapa in addition to functional class. They also report that in different WGD events the degree of enrichment from gene sequence features and functional classes may vary, though the directionality of enrichments (be they contributing to fractionation resistance or fractionation proneness) remains mostly the same . We were unable to replicate these results on our data using multiple regression; only expression level was consistently predictive of fractionation.
Of interest, a recent study Makino et al.  reports the effect of fractionation from ancient vertebrate WGD on the biased distribution of genes with copy-number variations in humans. This paper claims that retained duplicates suppress changes in copy number in their vicinity.
In conclusion our result agrees with current models that expression does play a role in fractionation resistance although by itself it can not explain the enrichments of functional classes. It is likely that systemic analysis on more genomes will be needed to clarify the role of expression and other sequence feature in explaining fractionation. At the present time a good predictor of fractionation resistance should still contain both expression and functional class and may even include how connected a gene is in the genome.
Research supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. DS holds the Canada Research Chair in Mathematical Genomics. The authors thank Aoife McLysaght for her interest and for drawing our attention to the work of Gout et al.
The publication charges for this article were funded by the Canada Research Chair in Mathematical Genomics, and by the University of Ottawa.
This article has been published as part of BMC Genomics Volume 15 Supplement 6, 2014: Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S6.
- Brett D, Pospisil H, Valcárcel J, Reich J, Bork P: Alternative splicing and genome complexity. Nature Genetics. 2002, 30 (1): 29-30. 10.1038/ng803.PubMedView ArticleGoogle Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003, 31 (19): 5654-5666. 10.1093/nar/gkg770.PubMedPubMed CentralView ArticleGoogle Scholar
- Crow KD, Wagner GP: What is the role of genome duplication in the evolution of complexity and diversity?. Molecular Biology and Evolution. 2006, 23 (5): 887-892. 10.1093/molbev/msj083.PubMedView ArticleGoogle Scholar
- Sémon M, Wolfe KH: Consequences of genome duplication. Current Opinion in Genetics & Development. 2007, 17 (6): 505-512. 10.1016/j.gde.2007.09.007.View ArticleGoogle Scholar
- Papp B, Pal C, Hurst LD: Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003, 424: 194-197. 10.1038/nature01771.PubMedView ArticleGoogle Scholar
- Schnable JC, Wang X, Pires JC, Freeling M: Escape from preferential retention following repeated whole genome duplication in plants. Frontiers in Plant Science. 2012, 3 (94): 10.3389/fpls.2012.00094.Google Scholar
- Birchler JA, Veitia RA: Gene balance hypothesis: Connecting issues of dosage sensitivity across biological disciplines. Proceedings of the National Academy of Sciences. 2012, 109 (37): 14746-14753. 10.1073/pnas.1207726109.View ArticleGoogle Scholar
- Zheng C, Chen E, Albert VA, Lyons E, Sankoff D: Ancient eudicot hexaploidy meets ancestral eurosid gene order. BMC Genomics. 2013, 14 (Suppl 7): 3-10.1186/1471-2164-14-S7-S3.View ArticleGoogle Scholar
- Chen ECH, Najar CBA, Zheng C, Brandts A, Lyons E, Tang H, Carretero-Paulet L, Albert VA, Sankoff D: The dynamics of functional classes of plant genes in rediploidized ancient polyploids. BMC Bioinformatics. 2013, 14 (S-15): 19-View ArticleGoogle Scholar
- Gout JF, Kahn D, Duret L: Paramecium Post-Genomics Consortium: The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 2010, 6 (5): 1000944-10.1371/journal.pgen.1000944.View ArticleGoogle Scholar
- Jung S, Cestaro A, Troggio M, Main D, Zheng P, Cho I, Folta KM, Sosinski BAA, Celton JM, Aruś P, Shulaev V, Verde I, Morgante M, Rokhsar DS, Velasco R, Sargent DJ: Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between rosaceous subfamilies. BMC Genomics. 2012, 13 (129):Google Scholar
- Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyére C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quétier F, Wincker P: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.PubMedView ArticleGoogle Scholar
- Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M, Murat F, Fouet O, Poulain J, Ruiz M, Roguet Y, Rodier-Goud M, Barbosa-Neto JF, Sabot F, Kudrna D, Ammiraju JS, Schuster SC, Carlson JE, Sallet E, Schiex T, Dievart A, Kramer M, Gelley L, Shi Z, Bérard A, Viot C, Boccara M, Risterucci A, Guignon V, Sabau X, Axtell MJ, Ma Z, Zhang Y, Brown S, Bourge M, Golser W, Song X, Clement D, Rivallan R, Tahi M, Akaza JM, Pitollat B, Gramacho K, D'Hont A, Brunel D, Infante D, Kebe I, Costet P, Wing R, McCombie WR, Guiderdoni E, Quétier F, Panaud O, Wincker P, Bocs S, Lanaud C: The genome of Theobroma cacao. Nature Genetics. 2011, 43: 101-108. 10.1038/ng.736.PubMedView ArticleGoogle Scholar
- Tomato Gene Consortium: The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012, 485: 635-641. 10.1038/nature11119.View ArticleGoogle Scholar
- Ibarra-Laclette E, Lyons E, Hernández-Guzmán G, Pérez-Torres CA, Carretero-Paulet L, Chang TH, Lan T, Welch AJ, Juárez MJ, Simpson J, Fernández-Cortés A, Arteaga-Vázquez M, Góngora-Castillo E, Acevedo-Hernández G, Schuster SC, Himmelbauer H, Minoche AE, Xu S, Lynch M, Oropeza-Aburto A, Cervantes-Pérez SA, de Jesuś Ortega-Estrada M, Cervantes-Luevano JI, Michael TP, Mockler T, Bryant D, Herrera-Estrella A, Albert VA, Herrera-Estrella L: Architecture and evolution of a minute plant genome. Nature. 2013, 498: 94-98. 10.1038/nature12132.PubMedView ArticleGoogle Scholar
- US Department of Energy, J.G.I: Mimulus Version 1. [http://www.phytozome.net/mimulus]
- Abrouk M, Murat F, Pont C, Messing J, Jackson S, Faraut T, Tannier E, Plomion C, Cooke R, Feuillet C, et al: Palaeogenomics of plants: synteny-based modelling of extinct ancestors. Trends in Plant Science. 2010, 15 (9): 479-487. 10.1016/j.tplants.2010.06.001.PubMedView ArticleGoogle Scholar
- Vitulo N, Forcato C, Carpinelli E, Telatin A, Campagna D, D'Angelo M, Zimbello R, Corso M, Vannozzi A, Bonghi C, Lucchin M, Valle G: A deep survey of alternative splicing in grape reveals changes in the splicing machinery related to tissue, stress condition and genotype. BMC Plant Biology. 2014, 14 (1): 99-10.1186/1471-2229-14-99.PubMedPubMed CentralView ArticleGoogle Scholar
- Zheng C, Sankoff D: Practical aliquoting of flowering plant genomes. BMC Bioinformatics. 2013, 14 (S-15): 8-View ArticleGoogle Scholar
- Zheng C, Swenson K, Lyons E, Sankoff D: OMG! orthologs in multiple genomes competing graph-theoretical formulations. Algorithms in Bioinformatics. Edited by: Przytycka, T., Sagot, M.-F. 2011, 364-375. WABI 2011, 11th Workshop on Algorithms in BioinformaticsView ArticleGoogle Scholar
- Lyons E, Pedersen B, Kane J, Freeling M: The value of nonmodel genomes and an example using synmap within coGe to dissect the hexaploidy that predates rosids. Tropical Plant Biology. 2008, 1 (3-4): 181-190. 10.1007/s12042-008-9017-y.View ArticleGoogle Scholar
- Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers J, Paterson A, Lisch D, Freeling M: Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar and grape: CoGe with rosids. Plant Physiology. 2008, 148: 1772-1781. 10.1104/pp.108.124867.PubMedPubMed CentralView ArticleGoogle Scholar
- The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics. 2000, 25 (1): 25-29. 10.1038/75556. Data Version 2012-04-20Google Scholar
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Schnable JC, Pedersen BS, Subramaniam S, Freeling M: Dose-sensitivity, conserved noncoding sequences and duplicate gene retention through multiple tetraploidies in the grasses. Frontiers in Plant Science. 2011, 2 (2): 10.3389/fpls.2011.00002.Google Scholar
- Makino T, McLysaght A, Kawata M: Genome-wide deserts for copy number variation in vertebrates. Nature Communications. 2013, 4: 10.1038/ncomms3283.Google Scholar
- Garsmeur O, Schnable JC, Almeida A, Jourda C, D'Hont A, Freeling M: Two evolutionarily distinct classes of paleopolyploidy. Molecular biology and evolution. 2014, 31 (2): 448-454. 10.1093/molbev/mst230.PubMedView ArticleGoogle Scholar
- Thomas BC, Pedersen B, Freeling M: Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Research. 2006, 16 (7): 934-946. 10.1101/gr.4708406.PubMedPubMed CentralView ArticleGoogle Scholar
- Lou P, Wu J, Cheng F, Cressman LG, Wang X, McClung CR: Preferential retention of circadian clock genes during diploidization following whole genome triplication in Brassica rapa. The Plant Cell Online. 2012, 24 (6): 2415-2426. 10.1105/tpc.112.099499.View ArticleGoogle Scholar
- Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, Conner JK, Shiu SH: Consequences of whole-genome triplication as revealed by comparative genomic analyses of the wild radish Raphanus raphanistrum and three other Brassicaceae species. The Plant Cell Online. 2014, 26 (5): 1925-1937. 10.1105/tpc.114.124297.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.