Retained duplicate genes in green alga Chlamydomonas reinhardtii tend to be stress responsive and experience frequent response gains
© Wu et al.; licensee BioMed Central. 2015
Received: 3 June 2014
Accepted: 9 February 2015
Published: 4 March 2015
Green algae belong to a group of photosynthetic organisms that occupy diverse habitats, are closely related to land plants, and have been studied as sources of food and biofuel. Although multiple green algal genomes are available, a global comparative study of algal gene families has not been carried out. To investigate how gene families and gene expression have evolved, particularly in the context of stress response that have been shown to correlate with gene family expansion in multiple eukaryotes, we characterized the expansion patterns of gene families in nine green algal species, and examined evolution of stress response among gene duplicates in Chlamydomonas reinhardtii.
Substantial variation in domain family sizes exists among green algal species. Lineage-specific expansion of families occurred throughout the green algal lineage but inferred gene losses occurred more often than gene gains, suggesting a continuous reduction of algal gene repertoire. Retained duplicates tend to be involved in stress response, similar to land plant species. However, stress responsive genes tend to be pseudogenized as well. When comparing ancestral and extant gene stress response state, we found that response gains occur in 13% of duplicate gene branches, much higher than 6% in Arabidopsis thaliana.
The frequent gains of stress response among green algal duplicates potentially reflect a high rate of innovation, resulting in a species-specific gene repertoire that contributed to adaptive response to stress. This could be further explored towards deciphering the mechanism of stress response, and identifying suitable green algal species for oil production.
Green algae are a group of photosynthetic organisms that are more closely related to land plants than to other major eukaryotic groups . A number of micro-green-algal species are suitable for biofuel production, and the lipid content of these algae increases significantly under various stress conditions . For example, in the green algal model Chlamydomonas reinhardtii, lipid droplets rich in triacylglycerol (TAG) form after nitrogen (N) deprivation [3,4]. Other stress conditions, such as salt stress and sulfur deprivation, also lead to increased TAG content in C. reinhardtii [5,6]. In addition, stress response and subsequent lipid accumulation exhibit tremendous diversity in green algae . For example, in response to N deficiency several green algae showed substantial differences in the tradeoff between growth and lipid content, the lipid accumulation time-course, and the stress level required to stimulate lipid formation . Thus, a better understanding of the mechanistic details of stress response in green algae will not only contribute to our knowledge about adaptation to stressful environments but also will have the potential to improve microalgal biofuel production.
To better understand how green algae respond to stress on a genomic level, we focused on the retention and loss of gene family members. This is because gene duplications lead to raw materials for evolution to act on [8,9] and they are a common feature of most eukaryotic species . Genes involved in stress response tend to be retained in various eukaryotes [11,12]. Thus a thorough look at the retention and loss patterns among green algal gene families may provide a new evolutionary perspective on stress response in green algal lineage. In general, the majority of gene duplicates are rapidly lost following the duplication events, but a significant number of duplicates are retained [10,13,14], contributing to organismal and regulatory complexity [11,15]. Gene retention occurs in a lineage-specific manner in a wide range of organisms [11,12,16] and a significant functional bias exists. In fungi, stress-related genes tend to undergo many duplications and losses . Similarly, in Arabidopsis thaliana, stress responsive genes tend be retained  but also tend to be pseudogenized . In a study of duplicates in yeasts, nematode, fruit fly, and A. thaliana, genes involved in response to environmental stress are prone to be retained in a lineage-specific manner . There is not yet a global study summarizing the gene gain and loss patterns of known gene families in green algae. In addition, it is not known if duplicate retention is correlated with their stress responsiveness in green algae.
After duplication, gene duplicates may acquire a novel function that contributes to adaptation . Such neo-functionalization can be an important source of inter-specific differences in stress response that could lay the foundation for diversity in stress-induced oil production in algae. An earlier study in A. thaliana showed that, although the predominant fate of gene duplicate was retention or loss of stress response, in around 6% of the cases there was evidence of stress response gain . Examining evolution in gene expression among duplicates could thus further elucidate how gene duplication and subsequent functional innovation shaped the gene repertoire involved in stress response, and likely contributed to the diversity in stress response in green algae. There is no global study on functional evolution of duplicate genes in green algae.
In this study, we integrated genomic and transcriptomic data to find out how gene families and gene expression have evolved in the green algal lineage in the context of stress. We first examined the variation of domain family composition in nine green algal species compared to land plants. Then we investigated how gene gain and loss events occurred in the green algal lineage and examined the functional bias in retained duplicates in C. reinhardtii using phylogenetic approaches. We also examined the pseudogenization of gene duplicates and their functional bias. Finally, we characterized the evolution of gene expression after duplication events to find out how gene function evolved in the context of stress response. Our study reveals the evolutionary trajectory of stress responsive gene families in the green algal lineage, and because lipid production is sharply induced by stress , our results could help pinpointing candidate genes for further study on lipid metabolism.
Results and discussion
Variation in domain family sizes among land plants and green algae
Among 5,441 domain families found in nine green algae and eight land plants (see Methods), 3,897 (71.6%) are shared between green algae and land plants, 759 (13.9%) are green alga specific, and 785 (14.4%) are land plant specific. As expected, more closely related species tend to have more highly correlated domain family sizes and there are clearly two distinct clusters, one for green algae and the other for land plants (Figure 1B). Nonetheless, there is large variation in family sizes among green algal species. This is likely the results of lineage-specific gene gain and loss events. Thus we next asked how lineage-specific gene gains and losses have contributed to the extant domain family sizes in the green algal lineage.
Gene gains and losses throughout green algal lineage
Another reason for an excess of loss events could be because there were errors in phylogenetic inferences. To ascertain that our phylogenetic inference of orthologous groups was robust, we conducted two tests. First, we randomly picked 1,500 gene families for bootstrap studies. Among 50,690 branches, 46.5% have a bootstrap value of ≥ 80% (Additional file 4: Figure S3A). We found that 89.4% of the Volvocales branches (C. reinhardtii and V. carteri) have a bootstrap value of ≥ 80% but only 47.1% in the Ostreococcus branches (O. sp.RCC809 and O. lucimarinus). Our results showed that branch lengths are positively correlated with bootstrap values (Spearman’s rank correlation coefficient of 0.634, p-value < 2.2e-16) (Additional file 4: Figure S3B). Second, we examined 129 domain families with only one copy in each of the 9 green algal and 2 land plant species assuming that this group of domain families has not undergone gene gain and loss. Thus, their gene trees should be identical to the species tree in topology. Because the alternative scenario is possible (gene gains and losses have occurred) in these families, this is a conservative estimate of the orthology inference accuracy. Our results showed a varying degree of consistency in branching between gene trees and the species tree on different branches on the species tree, ranging from 37.2% to 98.3% (Additional file 5: Figure S4). Longer branches on the species tree, indicating greater evolutionary distance, correspond with higher consistency with gene trees. We should emphasize that in branches with high consistency, for example the branch leading to the split of Micromonas and Ostreococcus with 84% consistent trees, the gene loss to gene gain ratio was comparable to that in branches with low consistency, indicating that green algae have experienced extensive gene loss.
Several other observations are consistent with the extensive gene loss phenomenon we have inferred in green algal species. First, a substantial number of gene families were lost on lineages leading to C. reinhardtii and O. lucimarinus after the split from land plants . Second, in Ostreococcus, downsizing of many gene families and gene losses were observed [26,27]. Third, a comparative analysis including Micromonas and Ostreococcus revealed that the common ancestor of Mamiellales had already experienced genome reduction . Therefore, although lineage-specific gene gains took place, the lineage-specific losses contributed more significantly to the species-specific gene repertoire in green algae.
Pseudogenes in green algae
Functions of retained C. reinhardtii genes duplicated after the C. reinhardtii-V. carteri split
Although gene losses appear to be more frequent, there are abundant retained duplicates throughout the green algal lineage. However, it is not clear if there is a functional bias among retained duplicates. To determine such a bias, we focused on C. reinhardtii since it is a green algal model organism with a relatively well-annotated gene set . After annotating the C. reinhardtii proteome with GO categories based on sequence similarity (see Methods), 5,725 of 17,114 proteins (33.5%) are in ≥1 GO categories. During reconciliation of domain and species trees, the branches in the species tree where duplications took place were inferred as well. Thus we can examine functional biases of retained duplicates in each of the branches leading to C. reinhardtii. This information allows us to ask if functions of retained genes were consistent over the course of C. reinhardtii evolution. Focusing on the C. reinhardtii lineage after its split from the V. carteri lineage, 1,817 duplication events (involving 2,682 retained duplicates) took place in the C. reinhardtii lineage. Among 13 categories enriched in retained duplicates, they can be classified into the following three types.
Consistency of functional biases over the course of C. reinhardtii evolution
The above discussion was on the C. reinhardtii lineage after its split from the V. carteri lineage. In the process of green algal evolution, some duplicates may be preferentially retained throughout while others may be period-specific due to the ever-changing environment. In addition, different kinds of genes may have different longevities thus resulting in differences in enriched categories over time. To distinguish these possibilities, the timing of duplication of each retained gene was pinpointed to an internal or external branch that led to C. reinhardtii. The functions of ancestral genes at the time of duplication were assumed to be the same as those of their descendants. First we focus on categories that are branch specific. The branches are numbered as in Figures 2 and 4A. Stress response is enriched in just branch 4, while nucleosome assembly is enriched in 3 and 4. The expansion of stress related gene families in just the youngest branch may be because environmental condition would constantly change in the evolutionary history of the C. reinhardtii lineage. Thus previously retained duplicates conferring selective advantage may not be adaptive due to exposure of green algae to ever-changing environments.
In contrast to stress, we found that signaling categories are consistently enriched for retained duplicates over time. Further, retained duplicates in branch 3 and 4 tend to play markedly similar roles as branch 1, particularly those in ion transport, signaling and cyclic nucleotide synthesis. Retained duplicates in branch 2 were associated with several types of light response and signal transduction. (Figure 4A, Additional file 6: Table S4). This result suggests that gene families involved in signaling and to a lesser degree transport are constantly expanded in a lineage-specific manner, and more generally indicates consistent innovation in interaction with the environment. Further, these enrichments are consistent with other eukaryotes [11,12]. Three categories involved in double fertilization, pollen tube growth, and embryo development are not relevant to the single-cellular C. reinhardtii. They are likely annotated by the sequence similarity to plant proteins by Blast2go . To test the robustness of our results, we tested the GO enrichment in the subset of domain families with ≥ 75% genes of a family only having one domain. The GO categories enriched in C. reinhardtii retained duplicates are similar to the results of all domain families (Additional file 7: Table S5). This illustrates that our results achieved using domain families instead of gene families are robust.
Functional categories enriched in conserved genes and genes associated with pseudogenes
In addition to retained duplicates, we defined “conserved genes” as those with the same copy number (1 to 3) in green algal species. The categories enriched in conserved genes in C. reinhardtii were few and largely involved in housekeeping functions, including GO terms such as translation and two other ribosome-related terms. Other terms enriched in conserved genes included tetrapyrrole synthesis, and photosystem I reaction center (Additional file 8: Table S3). In addition to functional bias in retention and conservation, we would also like to examine functional bias in pseudogenization. To find out if there is such bias, we tested GO enrichment in genes associated with pseudogenes, defined as genes that are the closest relatives (by sequence similarity) to pseudogenized duplicates. Stress response was the only enriched biological process we found in genes associated with pseudogenes, indicating that, in addition to their high rate of retention, these genes have frequently undergone gene loss, a finding similar to a pseudogene study in A. thaliana . This coupling between a high birth as well as high death rate is likely because genes involved in responding to specific stress conditions could become unnecessary when that condition does not persist.
Retained duplicates and their stress responsiveness
Multiple stress conditions lead to increased oil content in microalgae . In addition, we showed that stress response categories were enriched in retained duplicates in the C. reinhardtii lineage. However, GO annotation in C. reinhardtii is established solely based on computational approaches and has only 33.5% of genes annotated. To address these issues, we asked if retained genes tend to be stress responsive compared to singletons under six conditions: deficiency in N, sulfur, iron, copper, and CO2, and oxidative stress [37-42]. In all stress conditions except sulfur deprivation, stress responsiveness tends to be over-represented among retained genes (p-value ≤0.01), (Figure 4B, Additional file 9: Table S2). When combining all six RNA-seq datasets and defining stress responsiveness of a gene as being responsive in ≥1 conditions, retained duplicates still tend to be stress responsive (Figure 4B, Additional file 9: Table S2). This result corroborates our conclusion that retained duplicates tend to be in the stress response functional categories.
According to functional category analysis, stress response was not enriched among retained duplicates except in branch 4, which is the most recent branch (Figure 4A). Consistent with this finding, analysis of stress expression data also revealed that retained genes tend to be stress responsive in more conditions if they were duplicated more recently (Figure 4B), with 5 conditions in branch 4 and gradually declining to none in branch 1. This is also consistent with our finding that the closest relatives of pseudogenes tend to be involved in stress response, again reinforcing the idea that, although stress response genes tend to be retained at a higher rate compared to genome average, they tend to be more short-lived.
Stress response evolution post duplication in the C. reinhardtii lineage
Retained duplicates tend to be stress responsive (Figure 4A and B), suggesting that gene duplication might provide a source for innovations in response to stress. This possibility then leads to the question how often innovation, defined as gain of stress responsiveness from a non-responsive ancestral gene, occurred in green algae. To answer this question, we first integrated phylogenetic data and stress related expression datasets to infer the ancestral response state (Additional file 2: Figure S1C). With ancestral states, the gain and loss events of duplicated genes can then be distinguished . Stress response states are defined as U (up-regulated by ≥2 fold, false discovery rate ≤ 5%), D (down-regulated by ≥2 fold, false discovery rate ≤ 5%), and N (not significantly changed). Only the ancestral nodes leading immediately to extant genes were included in our analysis to avoid complications from predicting responses in nested branches .
Nonetheless, compared to the results of a similar study in A. thaliana , C. reinhardtii has comparable rates in response retention, loss, and switch, but a much higher rate of functional gain (13% versus 6%). Note that we examined only six abiotic conditions in C. reinhardtii compared to the 16 conditions encompassing biotic and abiotic stress environments in the A. thaliana study . Considering that the relative abundance of the evolutionary scenarios is similar across divergent conditions , having more data will likely not contribute to significant changes in the gain rate estimate in either direction. We should also emphasize that, regardless of the Ks value, C. reinhardtii consistently has a higher rate of functional gain (Figure 5D) when compared to A. thaliana . Taken together, our findings indicate that innovation occurs more often in C. reinhardtii than in A. thaliana in the context of stress response. The excess gain events could be due to the fact that C. reinhardtii is a single cellular organism and has a shorter life cycle while encountering more diverse environmental conditions, as one would expect that a shorter life cycle would lead to a higher number of mutations per unit time compared to species with a longer generation time, thus providing more raw material for adaptation to occur upon.
Our analysis of gene family evolution, functional evolution and pseudogenization in the green algal lineage complement previous studies in other eukaryotes, reinforcing that the association between lineage-specific evolution and stress response is a common feature of eukaryotes. This association is likely due to the selective pressure under ever-changing environment. In this scenario, stress gene duplicates were frequently under positive selection. In addition, the high rate of innovation in acquiring abilities to respond to stress in C. reinhardtii duplicates contributes to a highly diverse stress responsive gene repertoire that can potentially be adaptive.
The model organism C. reinhardtii is used to study the mechanism of stress response in green algae although it is not a direct candidate for biofuel production . As the general metabolic changes under stress might be similar across divergent micro-algal species , the particular genes involved in stress response could be quite different as they are shaped by lineage-specific family expansion and subsequent gain-of-function events. Such species-specificity cannot be deciphered when focusing on one model organism. Therefore, in addition to focusing on the well-established model organism of C. reinhardtii to discover the general biology of stress response in green algae, it is necessary to investigate diverse green algal species to discover their uniqueness in stress response, towards the ultimate goal of finding the perfect alga for biofuel production.
Protein domains in green algal species
Genome and protein sequences of nine green algal species (see Background) and eight land plant species were obtained from the US Department of Energy Joint Genome Institute (www.jgi.doe.gov) and Phytozome (www.phytozome.net, version 7.0). The nine green algal species included are Micromonas pusilla RCC299 , Micromonas pusilla CCMP1545 , Ostreococcus sp.RCC809 (www.jgi.doe.gov, unpublished, restrictions of publishing lifted), Ostreococcus tauri , Ostreococcus lucimarinus , Chlorella NC64A , Coccomyxa sp.C169 , C. reinhardtii , and Volvox carteri . The land plant species are Populus trichocarpa , Glycine max , A. thaliana , Vitis vinifera , Mimulus guttatus , Zea mays , Oryza sativa , and P. patens . Some of these sequence data were produced by the DOE JGI in collaboration with the user community. HMMER  was used with trusted cutoff to scan algal and plant protein sequences for Pfam domains . Fisher’s exact test was used to test the enrichment of GO categories in conserved domain families in green algae and expanded domain families in green algae using GO annotation on Pfam domains (ftp://ftp.ebi.ac.uk/).
To assess the completeness of gene annotation in green algae, we identified intergenic regions with coding potential and domain presence. Green algal protein sequences were aligned to green algal genome sequences with BLAST (tblastn, E-value threshold ≤ 1e-5, ). Matches with ≥30 amino acids long and ≥40% identity were kept and the matching genomic sequences were translated into peptide sequences using Genewise . After filtering out sequences with frameshifts and identical sequences, the rest were consolidated by concatenating sequences with ≥5 amino acid overlaps. Protein domains in the concatenated sequences were identified with HMMER using trusted cutoff. Domains overlapping with domains identified in annotated genes or pseudogenes (see later section) were eliminated. Overlapping domain sequences were merged and identical sequences removed (Additional file 2: Figure S1A).
Identification of missing domains in green algal genomes with variable annotation qualities
The robustness of our domain family analysis is fundamentally dependent on the quality of gene annotation in green algae. For example, an un-annotated domain would lead to a false prediction of gene loss. Most of the green algal genome annotations are automatically generated using computational approaches with various degrees of manual intervention. Thus, the quality of the annotation is likely highly variable and some genes and thus protein domains may not be annotated. Given our goal is to evaluate the gain and loss patterns of algal domain families, these potentially missing domains are false negatives that can have a significant impact on our subsequent studies. Thus, to identify domains in the genomes that are missed by current annotation, we aligned all the protein sequences of nine green algal species to their genomes to identify all sequences with coding potential.
A total of 810,578 matches were identified. The matching genomic sequences were translated into peptide sequences using Genewise . After removing redundant sequences that were identical in their entirety and merging overlapping sequences with identical amino acids in the overlapping region, 474,350 sequences remained. Overlapping sequences that were not identical in the overlapping regions were not removed since they might contain different domains. We identified 238,446 domains from these non-redundant sequences. After removing identical domain sequences, merging overlapping domain sequences, removing domain sequences overlapping with domains in annotated genes and pseudogenes identified using a published pipeline , 6,432 domain sequences remained and are referred to as domains in un-annotated regions, of which one-third are in V. Carteri (Additional file 10: Figure S2B). They are likely to be domains residing partially or completely in intronic or intergenic regions. Out of the 1,985 V. carteri domains, 862 are retrotransposon related, while a total of 223 domains in all eight other species are retrotransposon related. Most of the domains in un-annotated regions are shorter than the average of annotated domains (Additional file 10: Figure S2A), and the number is very small compared to the annotated domains in each species (Additional file 10: Figure S2B). For these reasons, we conclude that despite the automated nature of the current green algal annotations they contain most of the known Pfam domains in green algae and further analysis was done only using the annotated domains.
Defining green algal orthologous groups (OGs) and lineage-specific expansions (LSEs)
To build a phylogeny for each protein domain X, sequences of domain X were extracted from full length protein sequences of the green algae and land plants and aligned using MAFFT . Using the alignments generated, the phylogeny of each domain family was inferred using RAxML  with parameters -f d -m PROTGAMMAJTT. Domain family trees were reconciled with a previously published species tree  using NOTUNG . Notung compares a gene tree to a species tree to infer whether a node in a gene tree leads to bifurcating branches due to speciation or duplication. From the speciation and duplication node classifications, we inferred orthologous and paralogous relationships as well as gene gain and loss events for further analyses. We have also used the 18 s rRNA sequences of two land plants and all nine green algal species to built a species phylogeny using RAxML  with parameters -f a -x 12345 -p 12345 -# 1000 -m GTRGAMMA. Given the topology is highly similar, only the previously published phylogeny was used. For large domain families that RAxML run didn’t finish in 160 hours, a neighbor joining tree was built with PHYLIP  and the distance trees were broken down to smaller sub-clusters with ≥ 4 genes and ≤ 300 genes, and distance to root ≥ 0.05. Each sub-cluster was regarded as a “sub-family”. Domain sequences of sub-family members were used to identify orthologous groups with the same approach as above (Additional file 2: Figure S1B). To test the robustness of our orthology inference, bootstrap values were acquired on 1,500 randomly chosen domain families using RAxML (−f a -# 400 -m PROTGAMMAJTT -x 12345). Bootstrap values and branch length are plotted on Additional file 4: Figure S3B.
To functionally annotate the C. reinhardtii genome, protein sequences were first aligned to the nr protein database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) to identify putative homologs with an e-value cut-off of 1e-5. GO annotation was then inferred using blast2go based on genes with GO entries in the nr database . Fisher’s exact test was used to test the enrichment of retained duplicates in GO categories and a false discovery rate  adjusted p-value threshold of 0.05 was used.
Pseudogenes in green algae
Pseudogenes of green algae were predicted using a previously defined pipeline  with some modifications: 1) we used the whole genomes for the BLAST search and filtered out hits overlapping with genes as opposed to using the intergenic sequences only; 2) we made the pseudoexon merging recursive so that multiple pseudogenes can be derived from one pseudoexon cluster provided that the merged pseudogenes do not overlap with each other; 3) RepeatMasker (ver. 3.3.0) was run after the pseudogene pipeline on pseudogene sequences using Viridiplantae repeats (Cutoff = 300, Divergence = 30) and pseudogenes with hits other than “Simple_repeat”, “Low_complexity” and “Satellite” were removed from the dataset; 4) to control for false positives due to proteins being split between contigs, we removed pseudogenes that were within the 95th percentile genomic intron length from the end of a contig if they do not have disabling mutations, defined as a frame shift or premature stop codon. A “high confidence pseudogene” has one or more disabling mutations and has the following conditions satisfied: the distance between the pseudogene and the ends of the contig larger than the distance between the matching region and the end of the protein plus 95th percentile genomic intron length on both ends.
Inferring ancestral stress response state
To identify genes responsive to various stress conditions, C. reinhardtii RNA-seq datasets of N , S , Fe , Cu , and CO2 deficiency , and oxidative stress , were obtained from the Sequence Read Archive at NCBI (http://www.ncbi.nlm.nih.gov/sra). Reads were aligned to C. reinhardtii genome using Tophat , with following options: −i 13 -I 8712 -g 1. For each dataset, differential expression was determined using EdgeR  with a threshold of fold change ≥ 2 and false discovery rate ≤ 5%. Domain family phylogenies and extant stress response states were combined to infer the ancestral stress response states of C. reinhardtii genes using BayesTraits . The ancestral response state was inferred with maximum likelihood under the assumption that the probability of response change is proportional to the branch length on the domain family phylogeny. Three discrete functional states were defined as 1) up-regulation (u, by ≥ 2 fold and FDR ≤ 0.05), 2) down-regulation (d, by ≥ 2 fold and FDR ≤ 0.05), and 3) no-regulation (n). The ancestral states were estimated using multistate, maximum likelihood and most recent common ancestor (MRCA) options for each family phylogeny and each stress condition. Only ancestral states with a posterior probability > 0.5 were used for subsequent analysis. BayesTraits cannot be used in cases that all genes in one tree had same state, and we assumed in that case all ancestral genes had the same state as the extant ones. Ancestral gene response states were compared to extant gene response states to infer innovations and losses in stress response (Additional file 2: Figure S1C).
We would like to thank Andreas P.M. Weber for reading the manuscript and John Johnston for IT support. This project was supported by National Science Foundation grants IOS-1126998 and MCB-0929100 to SHS, Michigan State University College of Natural Science Dissertation Continuation Fellowship to GW, and Deutsche Forschungsgemeinschaft grant IRTG-1525 to AKD. This study does not involve human subjects, human material, human data, vertebrates or any regulated invertebrates.
Stress responsive genes tend to be species-specific in green algae that may contribute to species specific adaptation and variation in stress induced oil accumulation.
- Graham L, Graham J, Wilcox L. Algae. 2nd ed. United States: Cummings; 2009.Google Scholar
- Hu Q, Sommerfeld M, Jarvis E, Ghirardi M, Posewitz M, Seibert M, et al. Microalgal triacylglycerols as feedstocks for biofuel production: perspectives and advances. Plant J Cell Mol Biol. 2008;54:621–39.View ArticleGoogle Scholar
- Wang ZT, Ullrich N, Joo S, Waffenschmidt S, Goodenough U. Algal Lipid Bodies: Stress Induction, Purification, and Biochemical Characterization in Wild-Type and Starchless Chlamydomonas reinhardtii. Eukaryot Cell. 2009;8:1856–68.View ArticlePubMed CentralPubMedGoogle Scholar
- Moellering ER, Benning C. RNA Interference Silencing of a Major Lipid Droplet Protein Affects Lipid Droplet Size in Chlamydomonas reinhardtii. Eukaryot Cell. 2010;9:97–106.View ArticlePubMed CentralPubMedGoogle Scholar
- Siaut M, Cuine S, Cagnon C, Fessler B, Nguyen M, Carrier P, et al. Oil accumulation in the model green alga Chlamydomonas reinhardtii: characterization, variability between common laboratory strains and relationship with starch reserves. BMC Biotechnol. 2011;11:7.View ArticlePubMed CentralPubMedGoogle Scholar
- Cakmak T, Angun P, Demiray YE, Ozkan AD, Elibol Z, Tekinay T. Differential effects of nitrogen and sulfur deprivation on growth and biodiesel feedstock production of Chlamydomonas reinhardtii. Biotechnol Bioeng. 2012;109:1947–57.View ArticlePubMedGoogle Scholar
- Adams C, Godfrey V, Wahlen B, Seefeldt L, Bugbee B. Understanding precision nitrogen stress to optimize the growth and lipid content tradeoff in oleaginous green microalgae. Bioresour Technol. 2013;131:188–94.View ArticlePubMedGoogle Scholar
- Ohno S. Evolution by Gene Duplication. Germany: Springer-Verlag; 1970.View ArticleGoogle Scholar
- Force A, Lynch M, Pickett FB, Amores A, Yan Y, Postlethwait J. Preservation of Duplicate Genes by Complementary, Degenerative Mutations. Genetics. 1999;151:1531–45.PubMed CentralPubMedGoogle Scholar
- Lynch M, O’Hely M, Walsh B, Force A. The Probability of Preservation of a Newly Arisen Gene Duplicate. Genetics. 2001;159:1789–804.PubMed CentralPubMedGoogle Scholar
- Lespinet O, Wolf YI, Koonin EV, Aravind L. The Role of Lineage-Specific Gene Family Expansion in the Evolution of Eukaryotes. Genome Res. 2002;12:1048–59.View ArticlePubMed CentralPubMedGoogle Scholar
- Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu S-H. Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli. Plant Physiol. 2008;148:993–1003.View ArticlePubMed CentralPubMedGoogle Scholar
- Moore RC, Purugganan MD. The early stages of duplicate gene evolution. Proc Natl Acad Sci. 2003;100:15682–7.View ArticlePubMed CentralPubMedGoogle Scholar
- Moore RC, Purugganan MD. The evolutionary dynamics of plant duplicate genes. Curr Opin Plant Biol. 2005;8:122–8.View ArticlePubMedGoogle Scholar
- Vogel C, Chothia C. Protein Family Expansions and Biological Complexity. PLoS Comput Biol. 2006;2:e48.View ArticlePubMed CentralPubMedGoogle Scholar
- Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res. 2001;11:555–65.View ArticlePubMed CentralPubMedGoogle Scholar
- Wapinski I, Pfeffer A, Friedman N, Regev A. Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007;449:54–61.View ArticlePubMedGoogle Scholar
- Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu S-H. Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice. Plant Physiol. 2009;151:3–15.View ArticlePubMed CentralPubMedGoogle Scholar
- Zou C, Lehti-Shiu MD, Thomashow M, Shiu S-H. Evolution of Stress-Regulated Gene Expression in Duplicate Genes of Arabidopsis thaliana. PLoS Genet. 2009;5:e1000581.View ArticlePubMed CentralPubMedGoogle Scholar
- Blanc G, Agarkova I, Grimwood J, Kuo A, Brueggeman A, Dunigan DD, et al. The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation. Genome Biol. 2012;13:R39.View ArticlePubMed CentralPubMedGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38 suppl 1:D211–22.View ArticlePubMed CentralPubMedGoogle Scholar
- Li W-H. Molecular Evolution. Sunderland: Mass, Sinauer Associates; 1997.Google Scholar
- Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–96.View ArticlePubMed CentralPubMedGoogle Scholar
- Moreau H, Verhelst B, Couloux A, Derelle E, Rombauts S, Grimsley N, et al. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol. 2012;13:R74.View ArticlePubMed CentralPubMedGoogle Scholar
- Cock JM, Sterck L, Rouzé P, Scornet D, Allen AE, Amoutzias G, et al. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature. 2010;465:617–21.View ArticlePubMedGoogle Scholar
- Derelle E, Ferraz C, Rombauts S, Rouzé P, Worden AZ, Robbens S, et al. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci. 2006;103:11647–52.View ArticlePubMed CentralPubMedGoogle Scholar
- Palenik B, Grimwood J, Aerts A, Rouzé P, Salamov A, Putnam N, et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci. 2007;104:7705–10.View ArticlePubMed CentralPubMedGoogle Scholar
- Worden AZ, Lee J-H, Mock T, Rouzé P, Simmons MP, Aerts AL, et al. Green Evolution and Dynamic Adaptations Revealed by Genomes of the Marine Picoeukaryotes Micromonas. Science. 2009;324:268–72.View ArticlePubMedGoogle Scholar
- Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman GB, et al. The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions. Science. 2007;318:245–50.View ArticlePubMed CentralPubMedGoogle Scholar
- McDonald SM, Plant JN, Worden AZ. The Mixed Lineage Nature of Nitrogen Transport and Assimilation in Marine Eukaryotic Phytoplankton: A Case Study of Micromonas. Mol Biol Evol. 2010;27:2268–83.View ArticlePubMed CentralPubMedGoogle Scholar
- Hou S-Y, Govorunova EG, Ntefidou M, Lane CE, Spudich EN, Sineshchekov OA, et al. Diversity of Chlamydomonas Channelrhodopsins. Photochem Photobiol. 2012;88:119–28.View ArticlePubMed CentralPubMedGoogle Scholar
- Hasegawa E, Hayashi H, Asakura S, Kamiya R. Stimulation of in vitro motility of Chlamydomonas axonemes by inhibition of cAMP-dependent phosphorylation. Cell Motil Cytoskeleton. 1987;8:302–11.View ArticlePubMedGoogle Scholar
- Gaillard AR, Fox LA, Rhea JM, Craige B, Sale WS. Disruption of the A-Kinase Anchoring Domain in Flagellar Radial Spoke Protein 3 Results in Unregulated Axonemal cAMP-dependent Protein Kinase Activity and Abnormal Flagellar Motility. Mol Biol Cell. 2006;17:2626–35.View ArticlePubMed CentralPubMedGoogle Scholar
- Ward JM, Mäser P, Schroeder JI. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses. Annu Rev Physiol. 2009;71:59–82.View ArticlePubMedGoogle Scholar
- Turner BM. The adjustable nucleosome: an epigenetic signaling module. Trends Genet. 2012;28:436–44.View ArticlePubMedGoogle Scholar
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.View ArticlePubMedGoogle Scholar
- Miller R, Wu G, Deshpande RR, Vieler A, Gärtner K, Li X, et al. Changes in Transcript Abundance in Chlamydomonas reinhardtii following Nitrogen Deprivation Predict Diversion of Metabolism. Plant Physiol. 2010;154:1737–52.View ArticlePubMed CentralPubMedGoogle Scholar
- González-Ballester D, Casero D, Cokus S, Pellegrini M, Merchant SS, Grossman AR. RNA-Seq Analysis of Sulfur-Deprived Chlamydomonas Cells Reveals Aspects of Acclimation Critical for Cell Survival. Plant Cell Online. 2010;22:2058–84.View ArticleGoogle Scholar
- Page MD, Allen MD, Kropat J, Urzica EI, Karpowicz SJ, Hsieh SI, et al. Fe Sparing and Fe Recycling Contribute to Increased Superoxide Dismutase Capacity in Iron-Starved Chlamydomonas reinhardtii. Plant Cell Online. 2012;24:2649–65.View ArticleGoogle Scholar
- Castruita M, Casero D, Karpowicz SJ, Kropat J, Vieler A, Hsieh SI, et al. Systems Biology Approach in Chlamydomonas Reveals Connections between Copper Nutrition and Multiple Metabolic Steps. Plant Cell Online. 2011;23:1273–92.View ArticleGoogle Scholar
- Fang W, Si Y, Douglass S, Casero D, Merchant SS, Pellegrini M, et al. Transcriptome-Wide Changes in Chlamydomonas reinhardtii Gene Expression Regulated by Carbon Dioxide and the CO2-Concentrating Mechanism Regulator CIA5/CCM1. Plant Cell Online. 2012;24:1876–93.View ArticleGoogle Scholar
- Urzica EI, Adler LN, Page MD, Linster CL, Arbing MA, Casero D, et al. Impact of Oxidative Stress on Ascorbate Biosynthesis in Chlamydomonas via Regulation of the VTC2 Gene Encoding a GDP-l-galactose Phosphorylase. J Biol Chem. 2012;287:14234–45.View ArticlePubMed CentralPubMedGoogle Scholar
- Vieler A, Wu G, Tsai C-H, Bullard B, Cornish AJ, Harvey C, et al. Genome, Functional Gene Annotation, and Nuclear Transformation of the Heterokont Oleaginous Alga Nannochloropsis oceanica CCMP1779. PLoS Genet. 2012;8:e1003064.View ArticlePubMed CentralPubMedGoogle Scholar
- Blanc G, Duncan G, Agarkova I, Borodovsky M, Gurnon J, Kuo A, et al. The Chlorella variabilis NC64A Genome Reveals Adaptation to Photosymbiosis, Coevolution with Viruses, and Cryptic Sex. Plant Cell Online. 2010;22:2943–55.View ArticleGoogle Scholar
- Prochnik SE, Umen J, Nedelcu AM, Hallmann A, Miller SM, Nishii I, et al. Genomic Analysis of Organismal Complexity in the Multicellular Green Alga Volvox carteri. Science. 2010;329:223–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313:1596–604.View ArticlePubMedGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–83.View ArticlePubMedGoogle Scholar
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815.View ArticleGoogle Scholar
- Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, Casagrande A, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–7.View ArticlePubMedGoogle Scholar
- Hellsten U, Wright KM, Jenkins J, Shu S, Yuan Y, Wessler SR, et al. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc Natl Acad Sci. 2013;110:19478–82.View ArticlePubMed CentralPubMedGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science. 2009;326:1112–5.View ArticlePubMedGoogle Scholar
- Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M, et al. A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica). Science. 2002;296:92–100.View ArticlePubMedGoogle Scholar
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319:64–9.View ArticlePubMedGoogle Scholar
- Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39 suppl 2:W29–37.View ArticlePubMed CentralPubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.View ArticlePubMedGoogle Scholar
- Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–95.View ArticlePubMed CentralPubMedGoogle Scholar
- Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.View ArticlePubMed CentralPubMedGoogle Scholar
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.View ArticlePubMedGoogle Scholar
- Chen K, Durand D, Farach-Colton M. NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees. J Comput Biol. 2000;7:429–47.View ArticlePubMedGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. 2005Google Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc. 1995;57:289–300.Google Scholar
- Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–11.View ArticlePubMed CentralPubMedGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.View ArticlePubMed CentralPubMedGoogle Scholar
- Pagel M. The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies. Syst Biol. 1999;48:612–22.View ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.