Diel rewiring and positive selection of ancient plant proteins enabled evolution of CAM photosynthesis in Agave
BMC Genomics volume 19, Article number: 588 (2018)
Crassulacean acid metabolism (CAM) enhances plant water-use efficiency through an inverse day/night pattern of stomatal closure/opening that facilitates nocturnal CO2 uptake. CAM has evolved independently in over 35 plant lineages, accounting for ~ 6% of all higher plants. Agave species are highly heat- and drought-tolerant, and have been domesticated as model CAM crops for beverage, fiber, and biofuel production in semi-arid and arid regions. However, the genomic basis of evolutionary innovation of CAM in genus Agave is largely unknown.
Using an approach that integrated genomics, gene co-expression networks, comparative genomics and protein structure analyses, we investigated the molecular evolution of CAM as exemplified in Agave. Comparative genomics analyses among C3, C4 and CAM species revealed that core metabolic components required for CAM have ancient genomic origins traceable to non-vascular plants while regulatory proteins required for diel re-programming of metabolism have a more recent origin shared among C3, C4 and CAM species. We showed that accelerated evolution of key functional domains in proteins responsible for primary metabolism and signaling, together with a diel re-programming of the transcription of genes involved in carbon fixation, carbohydrate processing, redox homeostasis, and circadian control is required for the evolution of CAM in Agave. Furthermore, we highlighted the potential candidates contributing to the adaptation of CAM functional modules.
This work provides evidence of adaptive evolution of CAM related pathways. We showed that the core metabolic components required for CAM are shared by non-vascular plants, but regulatory proteins involved in re-reprogramming of carbon fixation and metabolite transportation appeared more recently. We propose that the accelerated evolution of key proteins together with a diel re-programming of gene expression were required for CAM evolution from C3 ancestors in Agave.
Among the three modes of photosynthesis in higher plants, the C3 pathway is the most ancient and common, occurring in approximately 90% of higher plant species. C4 and CAM photosynthesis, which account for approximately 3 and 6% of higher plant species, respectively, are evolutionarily derived from C3 photosynthesis and are believed to have arisen in response to selective pressures imposed by global reductions in atmospheric CO2 concentration (C4) and water limitation (CAM) [1, 2]. Both C4 and CAM plants capture CO2 via an initial carboxylation reaction catalyzed outside the chloroplast by phosphoenolpyruvate carboxylase (PEPC), which then subsequently delivers the captured CO2 at increased concentration to ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) in the chloroplast. While C4 operates via a spatial separation of carboxylases in different cell types, CAM operates via a temporal day/night separation of RuBisCO and PEPC with net CO2 uptake shifted predominantly to the night. Nocturnal CO2 uptake is accompanied by an inverse (compared to C3 and C4) day/night pattern of stomatal closure/opening in CAM that results in improved water-use efficiency (i.e., CO2 fixed per molecule of H2O lost) that is six-fold higher than C3 plants and 3-fold higher than C4 plants under comparable conditions .
Circadian regulation of gene expression has been implicated as a core component in the diel re-programming of metabolism that distinguishes CAM from C3 and C4 photosynthesis [4, 5]. For instance, the nocturnal activation of PEPC via phosphorylation in CAM plants is catalysed by a dedicated PEPC kinase (PPCK), the transcript abundance of which is regulated by the circadian clock [5, 6]. A number of clock genes which were examined in the facultative CAM species Mesembryanthemum crystallinum indicated both conserved and divergent functions of genes encoding components of the core oscillator and the clock output pathways in this species . In Opuntia ficus-indica (a constitutive CAM species), a transcriptomics study identified several genes implicated in the CAM biochemical pathway and in the circadian clock that displayed a unique 12-h periodicity  different from C3 species like Arabidopsis. Other work has suggested that the rhythmic expression of genes required for CAM is a consequence of diel changes in CAM-defining metabolites, such as malic acid . Together, these studies suggest that modifications to the circadian clock, including both input and output pathways, might be critical to CAM evolution. Still, how the circadian clock was integrated within signalling and core biochemical components of CAM during the evolution of this pathway remains unestablished. Genome-wide gene expression profiling in model species has greatly facilitated our understanding of gene regulatory networks that are relevant to circadian regulation . In this work, we develop a genome-wide approach to investigate the evolution of the core metabolic and regulatory elements of CAM via cross-species comparisons.
CAM and C4 photosynthesis are thought to have evolved from C3 ancestors multiple times in response to limitations in CO2 and water. It has been proposed that a propensity for frequent mutation or selection of certain regulatory genes during the early stages of evolution underpinned adaptations to environmental stress conditions . Several studies in C4 have uncovered positive selection of key components of photosynthesis [10, 11], despite the conserved nature of many of these genes. For example, enzymatic kinetics and structure modelling of positively-selected residues of RuBisCO subunits in Flaveria indicated that these key amino acid substitutions were relevant to the functional diversification of C4 . Although little has been characterized in CAM plants, the systematic analysis of positive selection in genes implicated in the circadian clock, photosynthesis, and CAM biochemistry could provide insight to unlock the mechanism(s) underlying CAM evolution.
Agave species are constitutive CAM species and many are important economic crops for beverage and fiber production [12, 13]. The water conserving properties of CAM have also highlighted the value of Agave as potential dedicated bioenergy feedstocks in semi-arid regions [14, 15]. Hence, a fundamental understanding of regulatory pathways underlying CAM in Agave is critical for efforts directed at engineering this pathway into C3 crops to improve water-use efficiency [13, 16]. All of the enzymes required for C4 and CAM appear to be homologs of ancestral forms found in C3 species , yet the specific genomic origins and genetic regulation of diel reprogramming of metabolism that distinguishes CAM from C3 and C4 photosynthesis are largely unknown . In this study, we investigated the molecular evolution of CAM in Agave using an approach that integrated gene co-expression networks, comparative genomics, and protein structure analysis. Our findings demonstrate that 1) core metabolic components required for CAM have ancient genomic origins traceable to non-vascular plants, 2) regulatory proteins required for diel re-programming of metabolism have a more recent origin shared among C3, C4, and CAM species and 3) accelerated evolution of key proteins together with a diel re-programming of gene expression were required for the evolution of CAM in Agave.
CAM physiology and identification of co-expression modules relevant to CAM
Using the quantitative gene expression data obtained from the RNA-Seq analysis of 15 tissues, including mature leaf (sampled at 8 time-points over a diel cycle), young leaf (3 time-points), root, meristem, rhizome, and stem in A. americana cultivar ‘Marginata’ , we created a gene co-expression network that was partitioned into 16 co-expression modules, with each displaying distinctive diel patterns (Fig. 1). Gene ontology (GO) enrichment analysis identified biological processes over-represented (p < 0.05) in each of these co-expression modules (Additional file 1: Figure S1). We characterized the GO enrichment results together with expression profiles to identify potential modules for CAM. For example, module ‘M11’, containing 1509 transcripts, was significantly (p < 0.05) associated with CAM-defining nocturnal net CO2 uptake (9 pm – 6 am) (Fig. 1a), and was over-represented by biological processes relevant to stomatal movement, carboxylation, and signal transduction (Table 1). Module M11 contains some key genes involved in CAM metabolism, including PHOSPHOENOLPYRUVATE CARBOXYLASE KINASE 1 (PPCK1), which regulates the temporal activation of nocturnal CO2 uptake. Since genes within module M11 are relevant to nocturnal carboxylation and stomatal movement, we propose that this module provides a molecular signature for temporal reprogramming of metabolism underpinning CAM.
Orthologous gene groups among CAM, C3 and C4 species
To understand the evolutionary origins and possible shared trajectories of module M11 and other CAM components between different photosynthetic lineages, we performed comparative genomics analysis of 15 plant species, including CAM, C3, C4, and non-vascular plant (NVP) species (Fig. 2a). Specifically, we identified ortholog clades through OrthoMCL analysis, such as clade NVP:C3:CAM:C4, which contains ortholog groups shared by NVP, C3, CAM, and C4 species, and clade C3:CAM:C4, which contains ortholog groups shared only by C3, CAM, and C4 species (Fig. 2b). The genes in the Agave (CAM) species were distributed mainly in three ortholog clades: NVP:C3:CAM:C4, C3:CAM:C4, and CAM-only, whereas the genes found in C4 species were distributed mainly in four ortholog clades: NVP:C3:CAM:C4, C3:CAM:C4, C3:C4, and C4-only (Additional file 2: Table S1), indicating that C4 evolution has one additional major genomic event, as represented by clade C3:C4, relative to CAM evolution.
Gene ontology enrichment analyses revealed that A. americana genes in clade NVP:C3:CAM:C4 were over-represented by biological processes relevant to primary metabolic processes (Additional file 3: Table S2), while those in clade C3:CAM:C4 were over-represented by regulatory processes (Additional file 4: Table S3). Notably, the core enzymes in C4 and CAM pathways belong to clade NVP:C3:CAM:C4, whereas the majority of the regulatory proteins belong to clade C3:CAM:C4 (Additional file 5: Table S4). Furthermore, genes in the M11 module were over-represented in the ortholog clade C3:CAM:C4 (Additional file 6: Table S5). Transcription factors were also over-represented (p < 0.05) in clade C3:CAM:C4, but under-represented (p < 0.05) in both NVP:C3:CAM:C4, and CAM-only clades (Fig. 2c). These results indicate that CAM evolution in Agave required genes that are shared across C3, C4 and CAM lineages to act as regulatory agents, whereas the core metabolic CAM machinery predates the C3-CAM-C4 divergence and is shared by NVP, C3, CAM, and C4 lineages.
Since the three CAM species (A. americana, A. deserti and A. tequilana) in this research are closely-related, the CAM-only ortholog groups may contains two types of CAM-specific genes: 1) specific to the Agave lineage and 2) conserved CAM-specific genes shared between Agave and other CAM lineages. To identify the conserved CAM-specific genes shared between Agave and other CAM lineages, the Agave genes in the CAM-only ortholog groups were compared with the protein tribes from the same 15 plant species as those used for ortholog group analysis, which were constructed by using TribeMCL . In general, the protein tribes are equivalent to gene families, with each tribe containing multiple ortholog groups. The A. americana genes in both the CAM-only ortholog groups and CAM-only tribes were then compared with an extended list of CAM and non-CAM species using BLASTp, resulting in the identification of 13 A. americana genes that have homologs in other three independent CAM lineages (i.e., Kalanchoë fedtschenkoi, Ananas comosus, and Phalaenopsis equestris) but not in 21 non-CAM species (Additional file 7: Table S6). Some of these CAM-specific genes displayed variable diel expression patterns (Additional file 8: Figure S2).
Positive selection in CAM evolution
To study protein sequences evolution, we analyzed the non-synonymous to synonymous substitution ratio (Ka/Ks) of orthologous gene pairs between A. americana and three non-CAM species, including two C3 species (Arabidopsis thaliana and Oryza sativa) and one C4 species (Zea mays). A Ka/Ks ratio greater than 1 indicates positive selection or an acceleration of protein evolution [20, 21]. Ka/Ks analysis identified a set of 160 Agave genes that had protein sequence regions with Ka/Ks ratio greater than one, as compared with orthologous genes in the three non-CAM species (Additional file 9: Table S7), indicating that these genes experienced accelerated amino acid substitutions during the divergence between CAM and non-CAM species. The functionally annotated genes in this set included genes involved in circadian clock, starch and sugar metabolism, and decarboxylation (Additional file 10: Table S8). Importantly, we identified 94 Agave genes that had protein sequence regions with Ka/Ks ratio greater than 1, as compared with orthologous genes in the two C3 species, but not the orthologs in the C4 plant (Additional file 11: Table S9). The functionally annotated genes in this 94-gene set include PPCK1 (Fig. 3a, c, and e), a core regulator of nocturnal carboxylation, and PsI-D2 (Additional file 11: Table S9), which encodes a component of photosystem I essential for photosynthesis  and a chloroplast beta-amylase (CT-BMY).
To further consolidate the results of positive selection analysis based on Ka/Ks ratio, codon-based site analysis was performed to identify specific amino acid sites under positive selection. Out of the 94 genes that were revealed to experience positive selection by Ka/Ks ratio analysis (Additional file 11: Table S9), 64 genes were shown to carry at least one positively selected site, with posterior probabilities > 80% (Additional file 12: Table S10). We found 2 sites in PPCK1 (K42, N73) and 1 site in CT-BMY (E24) were supported by positive selection analysis based on both random effects likelihood (REL) and Fast, Unconstrained Bayesian AppRoximation (FUBAR) models (Table 2). Our 3-D protein structural modeling revealed that the positive-selection regions (sites) occur in important functional domains. The regions with Ka/Ks ratio > 1 of PPCK1 were located in the N-terminal domains responsible for ATP-binding (Fig. 3b and d), and K42 and N73 were likely involved in the activation of kinase (Fig. 3g). The positively selected regions of CT-BMY are responsible for substrate binding (Additional file 13: Figure S3). The clade C3:CAM:C4 was over-represented (p < 0.0001) by Agave genes with Ka/Ks ratio > 1 in the Agave-Arabidopsis gene pair comparison (Fig. 2d). Together, these results indicate that accelerated amino acid substitution has played a key role in the modification of proteins required for the light and carbon processing reactions of photosynthesis, as well as regulatory and signaling pathways in Agave.
Diel re-programming of gene expression between CAM and C3
To further examine the molecular basis of the diel re-programming of metabolism that underpins CAM, we performed a comparative analysis of time-course expression data between A. americana (CAM) and Arabidopsis thaliana (C3). We identified two clusters of Agave genes that exhibited shifts in day/night patterns of abundance relative to the corresponding orthologous genes in Arabidopsis. One cluster, containing 22 genes, showed a morning-to-night shift with alternative peak expression at night and morning between Agave and Arabidopsis, respectively (Fig. 4a). This gene set was over-represented (p < 0.05) by co-expression modules M01 and M11 (Table 3), with M11as the aforementioned molecular marker for CAM-associated nocturnal gene expression. Among the 22 genes showing the morning-to-night shift, 8 encode proteins with unknown function; the 14 annotated genes have functions related to circadian clock, photosynthetic electron transport, malate transport, stomatal movement, and redox homeostasis (Additional file 14: Table S11). The other cluster, containing 20 genes, showed an afternoon-to-night shift, with alternate peak expression during late night and afternoon in Agave and Arabidopsis, respectively (Fig. 4b). This gene set was over-represented by co-expression modules M06 (p < 0.05) and M13 (p < 0.01) (Table 3), both of which showed positive association with gene expression in mature leaves around midnight (Fig. 1b), suggesting that the genes in these two modules are involved in CAM related processes during the night. Among the 20 genes showing the afternoon-to-night shift, 7 encode proteins with unknown function; 13 have known functions related to signaling, sugar metabolism and light processing (Additional file 15: Table S12). In concern of multiple copies of Agave and Arabidopsis genes in an ortholog group, we assessed each ortholog group containing abovementioned genes in Fig. 4, and found 23 ortholog groups containing two genes, with a one-to-one relationship between Agave and Arabidopsis. We evaluated the phylogeny and diel expression pattern for ortholog groups with a total of more than two Agave and Arabidopsis genes, and found that the expression patterns of Agave genes in comparison with their orthologs in Arabidopsis displayed differential day-night patterns (Additional file 16: Figure S4 and Additional file 17: Figure S5), suggesting a functional diversification of gene family members between C3 and CAM plants.
Comparison of circadian clock pathway between CAM and C3
We compared the diel expression pattern of genes implicated in signal input to the clock (e.g., PHOT2, Phototropin-2), clock oscillation (e.g., CCA1, Circadian Clock Associated 1; TOC1, TIMING OF CAB EXPRESSION1), and regulatory output from the clock (e.g., RVE1, REVEILLE 1) between Agave (CAM) and Arabidopsis (C3). All known genes implicated in input to the clock and the central clock oscillator showed similar diel expression patterns between Agave and Arabidopsis. However, RVE1 exhibited patterns of peak transcript abundance that were substantially out of phase between CAM and C3, with the peak expression of RVE1 occurring at midnight in Agave and morning in Arabidopsis (Fig. 5 and Fig. 6a; Additional file 18: Table S13). These comparative analyses support the concept that the multiple independent evolutionary origins of CAM exploited an existing C3 multi-gene loop oscillator similar to that in Arabidopsis thaliana, and diel re-programming of metabolism was achieved via changes to genes like RVE1 that link metabolic output to the clock.
Regulatory genes are critical for C3-to-CAM evolution
The distribution of CAM plants in diverse phylogenetic lineages indicates CAM has evolved from C3 via convergent evolution . This wide-spread convergent evolution could lead to a hypothesis that C3-to-CAM evolution is relatively easy, not requiring whole-genome-scale changes. In support of this hypothesis, the results in this study indicate that CAM evolution required genes that are shared across C3, C4, and CAM lineages to act as regulatory agents, whereas the core metabolic CAM machinery predates the CAM-C4 divergence and is shared by NVP, C3, CAM, and C4 lineages. The data presented here have shown that genes encoding enzymes required for C3 and C4 carboxylation, decarboxylation, and carbohydrate processing, as well as membrane transporters required for intercellular trafficking of metabolites required for CAM, were present across all the plant lineages examined. Thus, comparative analysis of protein sequences revealed that the core metabolic CAM machinery predates the CAM-C4 divergence and is shared by non-vascular plants (NVP), C3, CAM, and C4 lineages. Such data is also consistent with recent reports that PEPC had shared origins in C4 and CAM lineages before the divergence of these two pathways from a C3 progenitor .
As a means of focusing in on the evolution of the regulatory components required for the diel re-programming of metabolism that defines CAM, gene co-expression network analysis was used to reveal modules with distinctive diel patterns of abundance. One gene module in particular, designated M11, was identified as providing a molecular signature for the temporal re-programming of metabolism underpinning CAM. Gene module M11, showed abundant expression in mature leaves at night and contained PPCK1, which regulates the temporal activation of nocturnal CO2 uptake by PEPC [25, 26]. Module M11 was also over-represented by biological processes relevant to stomatal movement and signal transduction. In particular, gene ontology terms related to the ABA signaling pathway were significantly enriched in M11 (Table 1). Such genes are commonly known for their key roles in stomatal regulation and responses to stress . It is also noteworthy that sucrose and jasmonic acid signaling pathways were also enriched in M11, suggesting the involvement of hormones and/or metabolites in the regulation of nocturnal CO2 uptake in CAM. The genes in M11, along with multiple transcription factors, were over-represented in the ortholog clades shared across C3, C4 and the Agave/CAM lineages. Therefore, it can be hypothesized that a limited number of key regulators can drive the C3-to-CAM transition. To test this hypothesis, future studies should focus on detailed functional characterization of the transcription factors in module M11, which belong to the ortholog clade shared across C3, C4, and CAM lineages. Such approaches will be critical for accelerating efforts designed to engineer CAM into C3 crops [13, 16].
Regulation of stomatal movement is critical for CAM evolution
The altered night/day opening/closing of stomata is a core feature of CAM which requires coordination between mesophyll and guard cells . Genes related to stomatal movement were found to be enriched in module M11, alongside genes relevant for nocturnal carboxylation (Table 1). Guard cell inward-rectifying K+ channel AKT2 plays an important role in light-induced stomatal opening in Arabidopsis . Our analysis revealed that the peak expression of AKT2 was shifted to the night in Agave (CAM), as compared with day-time peak expression in Arabidopsis (C3) (Fig. 4a; Fig. 6b), suggesting that AKT2 is involved in nocturnal stomatal opening in Agave. In terms of determining the signals that might be responsible for shifting the timing of AKT2 expression in CAM, it has been suggested that photosynthetic metabolism in the mesophyll cells could contribute to the regulation of guard cell function . Interactions between mesophyll photosynthesis and guard cell regulation have been revealed in many non-CAM species . In particular, signals driven by sugar and malate content in the mesophyll appear to have central roles in controlling stomatal aperture . In C3 plants, malate metabolism in the mesophyll and malate transport from mesophyll to guard cells has been shown to play a central role in regulating stomatal responses over the day/night cycle . By analogy, it can be hypothesized that in CAM plants, the diel turnover and transport of malate across the vacuolar tonoplast membrane will play a critical role in stomatal regulation . In Arabidopsis, tonoplast dicarboxylate transporter (TDT) imports malate into the vacuoles . In Agave, the peak expression of TDT was shifted to the night, as compared to peak expression during the day in Arabidopsis (Fig. 4a; Fig. 6c), implying that the Agave TDT is responsible for the transport of malate into the vacuole during the dark period. Future molecular genetic experiments are needed to confirm the function of AKT2 and TDT, and consequently shed new light on identifying signals which integrate carboxylation processes with stomatal movement in CAM.
Diversification of circadian clock genes in CAM
Our results suggest that circadian shifts in gene transcription underpin the CAM-defining diel patterns of stomatal conductance, malate transport, carbohydrate processing and supply, and demand for ATP and reducing power in Agave. The circadian clock has been proposed to control the extensive re-synchronization of metabolism that distinguishes CAM from C3 and C4 photosynthesis . Circadian rhythms are ubiquitous in eukaryotes and many features of the circadian clock are conserved across plant lineages . Genes in the circadian system have been well-studied in Arabidopsis and can be divided into three functional groups: signal input (e.g., PHOT2), clock oscillation (e.g., CCA1, TOC1), and regulatory output (e.g., RVE1) . We identified homologs in Agave (CAM plant) for all the known clock genes in Arabidopsis (C3) and found that the diel expression patterns of all the clock genes are conserved between CAM and C3, except for RVE1 (a Myb-like transcription factor) in the output subset (Fig. 5), suggesting that C3 and CAM plants share the same core circadian oscillator, with diversification occurring in the regulatory output from the core clock, such as RVE1. It was previously reported that RVE1 integrates the circadian clock and auxin pathways to coordinate plant growth with changes in environmental time cues in Arabidopsis . Here we hypothesize that RVE1 is a key a node from which the C3 and CAM clock diverge and rewiring the diel expression of RVE1 is one of the necessary steps to switch from C3 to CAM.
Besides expression analysis, changes in the protein sequences of circadian clock components may also contribute to the diversification of clock functions. We performed 3D structural modeling to understand the function of protein domains containing sites of positive selection (reflected by Ka/Ks ratio > 1). Our structure-modeling revealed characteristics of positive-selection in the intrinsically disordered protein regions in some circadian clock proteins (Additional file 19: Figure S6). It has become clear that a certain protein may not have a well-defined and compactly folded three-dimensional (3D) structure under physiological conditions and such proteins are often termed natively unfolded protein  or an intrinsically disordered protein (IDP) . Many intrinsically disordered protein regions (IDPRs) have been known to carry out important biological functions . The clock proteins Late Elongated Hypocotyl (LHY), Early Flowering 3 (ELF3), and Lux Arrhythmo (LUX) play key roles in the plant circadian oscillation . The Ka/Ks profile of each of these circadian rhythm proteins showed multiple positive selection regions with significant fluctuations in Agave (data not shown). Interestingly, these proteins were found to exhibit a high ratio of IDPRs with high PONDR scores (Additional file 20: Table S14). The IDPRs of the clock proteins may lead to high flexibility in both structure and function, which in turn, could favor novel interactions with nucleic acids and/or other proteins. Further molecular genetics studies are needed to gain deeper understanding of the functions encoded in the apparent disordered state of these clock proteins and their role in the evolution of CAM.
CAM-specific genes and evolution
Multiple lineages of CAM photosynthesis plants have evolved independently from C3 photosynthesis ancestors [13, 40]. Recently, a comparative study using four genera in subfamily Agavoideae with CAM, weak CAM and C3, has suggested that gene family analysis together with expression profiling is informative in understanding the divergence of CAM . However, a broader sampling of diverse CAM lineages is still necessary. It can be hypothesized that there are two types of CAM-specific genes: 1) lineage-specific CAM genes shared by multiple closely related-species (e.g., Agave spp.) and 2) conserved CAM-specific genes shared by multiple independent lineages of CAM plants. To test this hypothesis, it would be useful in the future to expand this work by including several independently evolved CAM lineages in order to separate evolutionary phenomena unique to the CAM Agave from ones shared by different CAM lineages.
Evidence is presented that the genetic components of core CAM machinery in Agave have an ancient origin traceable to non-vascular plant lineages, and that regulatory proteins, which are shared between C3, CAM, and C4 species, were essential to the C3-to-CAM transition. The evolution of CAM in Agave from C3 photosynthesis also required positive selection in protein sequences of enzymes and transporters implicated in metabolism and signaling associated with CAM, as well as diel re-programming of gene expression related to key biological processes, such as circadian rhythms, redox homeostasis, and carbohydrate metabolism (Fig. 6). These results provide a set of new candidate genes for engineering increased water-use efficiency in crop plants experiencing water-limiting conditions via synthetic biology approaches.
Representative protein model per locus in Agave species
The transcript sequences of A. americana  (Additional file 21) were first filtered by CD-HIT-EST [42, 43] with a sequence identity threshold of 0.98 and the alignment coverage for the shorter sequence set as 0.5. This was an optimal setting based on the test with various combinations of sequence identity (0.90, 0.91, …, 1.0), and the alignment coverage for the shorter sequence (0.4, 0.5, …, 0.8), using the Arabidopsis genome annotation (TAIR10). To assign the representative protein model per locus with high-confidence, the protein sequences corresponding to the representative transcript sequences in A. americana, obtained from CD-HIT-EST clustering, as well as the non-redundant representative protein models of A. deserti and A. tequilana , were mapped onto the aforementioned draft genome assembly of A. tequilana using BLAT  with a minimum coverage (i.e., minimum fraction of query that must be aligned) of 60% and a minimum identity of 90%. Only the “best match” position was selected as the genomic location for each query protein sequence. If multiple proteins mapped to the overlapping genome locations and they shared significant sequence similarity, as determined by BLASTp [46, 47] with E-value cutoff of 1e-5, the longest protein sequence was selected as the representative protein model for the gene locus, resulting in 55,451, 31,761, and 31,799 representative protein sequences in A. americana, A. deserti and A. tequilana, respectively (Additional files 22, 23 and 24).
Comparative analysis of protein sequences
The protein sequences of 15 plant species, including the aforementioned representative protein sequences in three CAM species of Agave (Agave americana, A. deserti and A. tequilana), and 12 non-CAM plant species downloaded from public databases, which included three non-vascular plant species Chlamydomonas reinhardtii (www.Phytozome.net; Phytozome v9.0), Physcomitrella patens (Phytozome v9.0), Selaginella moellendorffii (Phytozome v9.0); three C4 plant species Sorghum bicolor (Phytozome v9.0), Setaria italica (Phytozome v9.0) and Zea mays (Phytozome v9.0); three C3 monocot plant species Brachypodium distachyon (Phytozome v9.0e), Oryza sativa (Phytozome v9.0), and Musa acuminata (version 1; http://banana-genome.cirad.fr); three C3 dicot species Arabidopsis thaliana (v10; www.Arabidopsis.org), Populus trichocarpa (Phytozome v9.0), and Solanum tuberosum (DM_v3.4; potatogenomics.plantbiology.msu.edu). The longest protein sequence was selected in case of multiple transcripts annotated for one gene locus. The ortholog groups (OGs) were constructed using OrthoMCL  with default parameters (a BLASTp E-value cutoff of 1e-5 and percent match cutoff of 50%). Also, the protein sequences used for ortholog analysis were clustered into tribes using TRIBE-MCL , with a BLASTp E-value cutoff of 1e-5 and an inflation value of 1.5. To identify conserved CAM-specific genes, the A. americana genes in both the CAM-only ortholog groups and CAM-only tribes were then compared with an extended list of three independent CAM lineages and 21 non-CAM species using BLASTp [46, 47] with an E-value cutoff of 1e-5. The three independent CAM lineages are Kalanchoë fedtschenkoi , Ananas comosus  and Phalaenopsis equestris . The 21 non-CAM species are Amborella trichopoda (PLAZA 3.0 ; available at http://bioinformatics.psb.ugent.be/plaza/), Arabidopsis thaliana (PLAZA 3.0), Beta vulgaris (PLAZA 3.0), Brachypodium distachyon (PLAZA 3.0), Carica papaya (PLAZA 3.0), Citrus sinensis (PLAZA 3.0), Eucalyptus grandis (PLAZA 3.0), Fragaria vesca (PLAZA 3.0), Medicago truncatula (PLAZA 3.0), Mimulus guttatus (PLAZA 3.0), Musa acuminata (PLAZA 3.0), Oryza sativa (PLAZA 3.0), Populus trichocarpa (PLAZA 3.0), Prunus persica (PLAZA 3.0), Setaria italica (PLAZA 3.0), Solanum lycopersicum (PLAZA 3.0), Solanum tuberosum (PLAZA 3.0), Sorghum bicolor (PLAZA 3.0), Theobroma cacao (PLAZA 3.0), Vitis vinifera (PLAZA 3.0), Zea may (PLAZA 3.0).
A total of 47,677 transcripts that were detected in at least 4 of the 15 samples, with an average expression level of 5 RPKM or higher (Additional file 25), were utilized to construct a weighted gene co-expression network using the R package WGCNA . The gene expression data were log2 transformed. The dynamic tree-cut algorithm was used to identify co-expression modules with a minimum module size of 30 and a height cut of 0.25.
Comparative analysis of gene expression patterns between CAM and C3 plants
The Arabidopsis–Agave orthologous gene pairs were identified through the combination of both OrthoMCL strategies and the reciprocal best hits (RBH) based on BLASTp with an E-value cutoff of 1e-5. The diurnal expression data for Arabidopsis thaliana were obtained from Mockler et al. (2007) . Both Arabidopsis and Agave plants were grown under a photoperiod of 12 h light:12 h dark cycle. The Arabidopsis expression data were collected at 0, 4, 8, 12, 16, 20, and 24 h, whereas the Agave data were collected at 0, 3, 6, 9, 12, 15, 18, and 21 h after the start of the light period . The cubic interpolation algorithm implemented in Matlab (Mathworks, Inc.) was used to simulate the gene expression levels at additional time points, so that both time-course data sets consisted of the same time points: 0, 3, 4, 6, 8, 9, 12, 15, 16, 18, 20, and 21 h after the start of the light period. The gene expression data were normalized by Z score transformation. The hierarchical clustering of gene expression was performed using the Bioinformatics Toolbox in Matlab (Mathworks, Inc.).
Gene ontology analysis
Whole-genome GO term annotation was performed using Blast2GO with a BLASTp E-value hit filter of 1 × 10–6, an annotation cutoff value of 55, and GO weight of 5. GO enrichment analysis for the ortholog clades was performed using BiNGO . In addition, GO enrichment analysis was performed on each of the 16 co-expression modules using ClueGO  to interpret functionally grouped gene ontology annotation networks. The right-sided hypergeometric enrichment test was performed at a medium network specificity selection, and p-value correction was performed using the Benjamini-Hochberg method. The selected GO tree levels were a minimum of 3 and a maximum of 8, while each cluster was set to a minimum of between 3 and 4% genes. The GO term grouping setting was selected to minimize GO term redundancy, and the highest significance term enriched was used as the representative term for each functional cluster. The GO terms with p-values less than or equal to 0.05 were considered significantly enriched.
Annotation of pathway and transcription factors
Pathway annotation for the protein sequences was performed on the KEGG Automatic Annotation Server KAAS , using the BBH (bi-directional best hit) method to assign orthologs. Transcription factors were identified from the protein sequences using the online tool PlantTFcat .
Nonsynonymous (Ka) to synonymous (Ks) substitution ratio and positively selected sites
The orthologous gene pairs between two species were identified through the combination of both Best Reciprocal Hits (BRH) and OrthoMCL strategies. The coding sequences were aligned using PAL2NAL , guided by protein sequence alignment generated by MAFFT (linsi; version 7.045b) , and gaps in the alignment were removed. The gapless coding sequence alignments were used for Ka/Ks ratio calculation using the Bioinformatics Toolbox in Matlab (Mathworks, Inc.) with a 50-codon sliding window. For identifying positively selected sites, coding sequences from Arabidopsis, maize, rice and Agave were aligned by Translatorx  using the standalone script. The HyPhy package were used to identify positively selected sites as described , and the tests of FUBAR and REL models as implemented in Datamonkey webserver were used with default settings . Since we used a sliding window to study the regions of protein with positive selection, we calculated the probabilities of Ka/Ks positive regions to a null hypothesis that Ka/Ks equals to one by one-sided t-test, as described by Schmid and Yang (2008) .
Protein structure modeling
Protein structure models were built using the iterative threading assembly refinement (I-TASSER, version 3.0) methods . The structure-based annotation tool COFACTOR  was adopted to predict the potential function and the cofactor binding site of the models. Disordered region(s) were analyzed using the PONDR VL-XT program [66,67,68].
Phylogenetic tree construction
The multiple sequence alignment of protein sequences was created using MAFFT . The phylogenetic tree was constructed from the protein sequence alignment using the Neighbor-Joining method  implemented in MEGA7 , with the percentage of replicate trees calculated by the bootstrap test (100 replicates). For trees with three sequences, no bootstrap value was given due to the lack of phylogeny tests of branch. All ambiguous positions were removed for each sequence pair.
Crassulacean acid metabolism
Circadian clock associated 1
Early flowering 3
Fast unconstrained bayesian approximation
Iterative threading assembly refinement
Late elongated hypocotyl
phosphoenolpyruvate carboxylase kinase 1
Reciprocal best hits
Random effects likelihood
Timing of cab expression1
Ehleringer J, Monson R. Evolutionary and ecological aspects of photosynthetic pathway variation. Annu Rev Ecol Syst. 1993;24:411–39.
Yamori W, Hikosaka K, Way DA. Temperature response of photosynthesis in C3, C4, and CAM plants: temperature acclimation and temperature adaptation. Photosynth Res. 2014;119:101–17.
Borland AM, Griffiths H, Hartwell J, Smith JA. Exploiting the potential of plants with crassulacean acid metabolism for bioenergy production on marginal lands. J Exp Bot. 2009;60:2879–96.
Hartwell J. The co-ordination of central plant metabolism by the circadian clock. Biochem Soc Trans. 2005;33:945–8.
Nimmo HG. How to tell the time: the regulation of phosphoenolpyruvate carboxylase in crassulacean acid metabolism (CAM) plants. Biochem Soc Trans. 2003;31:728–30.
Nimmo HG. The regulation of phosphoenolpyruvate carboxylase in CAM plants. Trends Plant Sci. 2000;5:75–80.
Boxall SF, Foster JM, Bohnert HJ, Cushman JC, Nimmo HG, Hartwell J. Conservation and divergence of circadian clock operation in a stress-inducible crassulacean acid metabolism species reveals clock compensation against stress. Plant Physiol. 2005;137:969–82.
Mallona I, Egea-Cortines M, Weiss J. Conserved and divergent rhythms of crassulacean acid metabolism-related and core clock gene expression in the cactus Opuntia ficus-indica. Plant Physiol. 2011;156:1978–89.
Mockler TC, Michael TP, Priest HD, Shen R, Sullivan CM, Givan SA, et al. The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. Cold Spring Harb Symp Quant Biol. 2007;72:353–63.
Kapralov MV, Filatov DA. Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol Biol. 2007;7:73.
Kapralov MV, Kubien DS, Andersson I, Filatov DA. Changes in Rubisco kinetics during the evolution of C4 photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011;28:1491–503.
Escamilla-Trevino L. Potential of plants from the genus Agave as bioenergy crops. Bioenergy Research. 2012;5:1–9.
Yang X, Cushman JC, Borland AM, Edwards EJ, Wullschleger SD, Tuskan GA, et al. A roadmap for research on crassulacean acid metabolism (CAM) to enhance sustainable food and bioenergy production in a hotter, drier world. New Phytol. 2015;207:491–504.
Abraham PE, Yin H, Borland AM, Weighill D, Lim SD, De Paoli HC, et al. Transcript, protein and metabolite temporal dynamics in the CAM plant Agave. Nature Plants. 2016;2:16178.
Davis S, Dohleman F, Long S. The global potential for Agave as a biofuel feedstock. Global Change Biology Bioenergy. 2011;3:68–78.
Borland AM, Hartwell J, Weston DJ, Schlauch KA, Tschaplinski TJ, Tuskan GA, et al. Engineering crassulacean acid metabolism to improve water-use efficiency. Trends Plant Sci. 2014;19:327–38.
West-Eberhard MJ, Smith JA, Winter K. Plant science. Photosynthesis, reorganized. Science. 2011;332:311–2.
Hartwell J, Dever LV, Boxall SF. Emerging model systems for functional genomics analysis of crassulacean acid metabolism. Curr Opin Plant Biol. 2016;31:100–8.
Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–84.
Chen Q, Han Z, Jiang H, Tian D, Yang S. Strong positive selection drives rapid diversification of R-genes in Arabidopsis relatives. J Mol Evol. 2010;70:137–48.
Looman C, Abrink M, Mark C, Hellman L. KRAB zinc finger proteins: an analysis of the molecular mechanisms governing their increase in numbers and complexity during evolution. Mol Biol Evol. 2002;19:2118–30.
Ihnatowicz A, Pesaresi P, Varotto C, Richly E, Schneider A, Jahns P, et al. Mutants for photosystem I subunit D of Arabidopsis thaliana: effects on photosynthesis, photosystem I stability and expression of nuclear genes for chloroplast functions. Plant J. 2004;37:839–52.
Edwards E, Ogburn R. Angiosperm responses to a low-CO2 world: CAM and C4 photosynthesis as parallel evolutionary trajectories. Int J Plant Sci. 2012;173:724–33.
Christin PA, Arakaki M, Osborne CP, Brautigam A, Sage RF, Hibberd JM, et al. Shared origins of a key enzyme during the evolution of C4 and CAM metabolism. J Exp Bot. 2014;65:3609–21.
Hartwell J, Gill A, Nimmo GA, Wilkins MB, Jenkins GI, Nimmo HG. Phosphoenolpyruvate carboxylase kinase is a novel protein kinase regulated at the level of expression. Plant J. 1999;20:333–42.
Owen NA, Griffiths H. A system dynamics model integrating physiology and biochemical regulation predicts extent of crassulacean acid metabolism (CAM) phases. New Phytol. 2013;200:1116–31.
Cutler SR, Rodriguez PL, Finkelstein RR, Abrams SR. Abscisic acid: emergence of a core signaling network. Annu Rev Plant Biol. 2010;61:651–79.
Lee JS. Stomatal opening mechanism of CAM plants. Journal of Plant Biology. 2010;53:19–23.
Ivashikina N, Deeken R, Fischer S, Ache P, Hedrich R. AKT2/3 subunits render guard cell K+ channels Ca2+ sensitive. J Gen Physiol. 2005;125:483–92.
Luttge U. Ecophysiology of crassulacean acid metabolism (CAM). Ann Bot. 2004;93:629–52.
Lawson T, Blatt MR. Stomatal size, speed, and responsiveness impact on photosynthesis and water use efficiency. Plant Physiol. 2014;164:1556–70.
Emmerlich V, Linka N, Reinhold T, Hurth MA, Traub M, Martinoia E, et al. The plant homolog to the human sodium/dicarboxylic cotransporter is the vacuolar malate carrier. Proc Natl Acad Sci U S A. 2003;100:11122–6.
O'Neill JS, van Ooijen G, Dixon LE, Troein C, Corellou F, Bouget FY, et al. Circadian rhythms persist without transcription in a eukaryote. Nature. 2011;469:554–8.
McWatters HG, Devlin PF. Timing in plants--a rhythmic arrangement. FEBS Lett. 2011;585:1474–84.
Rawat R, Schwartz J, Jones MA, Sairanen I, Cheng Y, Andersson CR, et al. REVEILLE1, a Myb-like transcription factor, integrates the circadian clock and auxin pathways. Proc Natl Acad Sci U S A. 2009;106:16883–8.
Uversky VN, Gillespie JR, Fink AL. Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins. 2000;41:415–27.
Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, et al. Intrinsically disordered protein. J Mol Graph Model. 2001;19:26–59.
Uversky VN, Dunker AK. Understanding protein non-folding. Biochim Biophys Acta. 1804;2010:1231–64.
Staiger D, Shin J, Johansson M, Davis SJ. The circadian clock goes genomic. Genome Biol. 2013;14:208.
Silvera K, Neubig KM, Whitten WM, Williams NH, Winter K, Cushman JC. Evolution along the crassulacean acid metabolism continuum. Funct Plant Biol. 2010;37:995–1010.
Heyduk K, Ray JN, Ayyampalayam S, Leebens-Mack J. Shifts in gene expression profiles are associated with weak and strong crassulacean acid metabolism. Am J Bot. 2018;105:587–601.
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–2.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
Gross SM, Martin JA, Simpson J, Abraham-Juarez MJ, Wang Z, Visel A. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana. BMC Genomics. 2013;14:563.
Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002;12:656–64.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–89.
Yang X, Hu R, Yin H, Jenkins J, Shu S, Tang H, et al. The Kalanchoë genome provides insights into convergent evolution and building blocks of crassulacean acid metabolism. Nat Commun. 2017;8:1899.
Ming R, VanBuren R, Wai CM, Tang H, Schatz MC, Bowers JE, et al. The pineapple genome and the evolution of CAM photosynthesis. Nat Genet. 2015;47:1435–42.
Cai J, Liu X, Vanneste K, Proost S, Tsai W-C, Liu K-W, et al. The genome sequence of the orchid Phalaenopsis equestris. Nat Genet. 2015;47:65–72.
Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inze D, Mueller-Roeber B, et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 2015;43:D974–81.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005;21:3448–9.
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25:1091–3.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35:W182–5.
Dai X, Sinharoy S, Udvardi M, Zhao PX. PlantTFcat: an online plant transcription factor and transcriptional regulator categorization and analysis tool. BMC Bioinformatics. 2013;14:321.
Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–12.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38:W7–13.
Pond SL, Frost SD, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005;21:676–9.
Delport W, Poon AF, Frost SD, Kosakovsky Pond SL. Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics. 2010;26:2455–7.
Schmid K, Yang Z. The trouble with sliding windows and the selective pressure in BRCA1. PLoS One. 2008;3:e3746.
Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010;5:725–38.
Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 2012;40:W471–7.
Li X, Romero P, Rani M, Dunker AK, Obradovic Z. Predicting protein disorder for N-, C-, and internal regions. Genome Informatics. 1999;10:30–40.
Romero P, Obradovic Z, Dunker AK. Sequence data analysis for long disordered regions prediction in the calcineurin family. Genome Informatics. 1997;8:110–24.
Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK. Sequence complexity of disordered protein. Proteins. 2001;42:38–48.
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–25.
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–4.
The authors would like to thank J.L. Labbe for critical review and clarifying comments on the manuscript. We are grateful to anonymous reviewers for comments to improve the manuscript.
This work was supported by the U.S. Department of Energy (DOE), Office of Science, Genomic Science Program (under award number DE-SC0008834), the DOE Center for Bioenergy Innovation (CBI), and the Laboratory Directed Research and Development (LDRD) Program (Project ID: 5801) of Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the US DOE under Contract Number DE–AC05–00OR22725. Work conducted at the US Department of Energy Joint Genome Institute (JGI) was supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.
Availability of data and materials
All data of this work are available from links of the manuscript. The supporting files are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.37g7v3v.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure S1. Overview charts of the gene ontology (GO) biological processes over-represented in individual co-expression modules. (PDF 464 kb)
Table S1. Percentage of the gene set in each individual species distributed into different ortholog clades. (PDF 39 kb)
Table S2. Biological processes over-represented (p < 1E-20) in ortholog clade NVP:C3:CAM:C4 in Agave americana. NVP:C3:CAM:C4 represents orthologs shared by C3, CAM, C4 and NVP (i.e. non-vascular plants). (PDF 39 kb)
Table S3. Biological processes over-represented in ortholog clade C3:CAM:C4 in Agave americana. C3:CAM:C4 represents orthologs shared only by C3, CAM and C4 species. (PDF 17 kb)
Table S4. Ortholog clades of C4 cycle genes & related transporters. (PDF 118 kb)
Table S5. Distribution of co-expression modules in each individual ortholog clade in Agave americana. (PDF 90 kb)
Table S6. List of Agave americana genes that have homologs in other three independent CAM lineages but not in 21 non-CAM plant species. (PDF 31 kb)
Figure S2. The diel expression pattern of conserved CAM-specific genes in Agave. (PDF 111 kb)
Table S7. List of Agave americana genes with Ka/Ks ratio greater than one in Agave-Arabidopsis, Agave-Oryza and Agave-Zea pairs. (PDF 353 kb)
Table S8. Genes implicated in the CAM pathway undergoing positive selection as revealed from Ka/Ks ratio calculated from Agave-Arabidopsis, Agave-Oryza and Agave-Zea orthologous gene pairs. (PDF 91 kb)
Table S9. List of Agave americana genes with Ka/Ks ratio greater than one in Agave-Arabidopsis and Agave-Oryza but not Agave-Zea pairs. (PDF 47 kb)
Table S10. The positively selected sites of candidates with Ka/Ks > 1 from Table S9. (PDF 304 kb)
Figure S3. Positive selection region in chloroplast β-amylase (CT-BMY). (PDF 171 kb)
Table S11. List of Agave americana genes with morning-to-night shift in expression pattern as compared with the orthologous genes in Arabidopsis. (PDF 70 kb)
Table S12. List of Agave americana genes with afternoon-to-night shift in expression pattern as compared with the orthologous genes in Arabidopsis. (PDF 69 kb)
Figure S4. Phylogenetic trees and diel gene expression patterns of multi-gene ortholog groups (i.e., with a total of more than two Agave and Arabidopsis genes) listed Fig. 4a. (PDF 164 kb)
Figure S5. Phylogenetic trees and diel gene expression patterns of multi-gene ortholog groups (i.e., with a total of more than two Agave and Arabidopsis genes) listed Fig. 4b. (PDF 156 kb)
Table S13. List of circadian clock genes in Agave americana and Arabidopsis thaliana. (PDF 18 kb)
Figure S6. Intrinsically disordered protein regions (IDPRs) and structural models of the circadian clock proteins in Agave americana. (PDF 153 kb)
Table S14. Intrinsically disordered protein region (IDPR) in protein sequences under positive selection. (PDF 33 kb)
Agave americana transcript sequences of less than 200 bp. (FASTA 5010 kb)
Representative protein sequences in Agave americana. (FASTA 9640 kb)
Representative protein sequences in Agave deserti. (FASTA 11600 kb)
Representative protein sequences in Agave tequilana. (FASTA 11400 kb)
Expression data of Agave americana transcripts detected in at least 4 of the 15 samples, with an average expression level of 5 RPKM or higher. (XLSX 14100 kb)
About this article
Cite this article
Yin, H., Guo, HB., Weston, D.J. et al. Diel rewiring and positive selection of ancient plant proteins enabled evolution of CAM photosynthesis in Agave. BMC Genomics 19, 588 (2018). https://doi.org/10.1186/s12864-018-4964-7
- Crassulacean acid metabolism
- Comparative genomics
- Positive selection
- Circadian rhythm