Differential gene expression associated with a floral scent polymorphism in the evening primrose Oenothera harringtonii (Onagraceae)

Background Plant volatiles play an important role in both plant-pollinator and plant-herbivore interactions. Intraspecific polymorphisms in volatile production are ubiquitous, but studies that explore underlying differential gene expression are rare. Oenothera harringtonii populations are polymorphic in floral emission of the monoterpene (R)-(−)-linalool; some plants emit (R)-(−)-linalool (linalool+ plants) while others do not (linalool- plants). However, the genes associated with differential production of this floral volatile in Oenothera are unknown. We used RNA-Seq to broadly characterize differential gene expression involved in (R)-(−)-linalool biosynthesis. To identify genes that may be associated with the polymorphism for this trait, we used RNA-Seq to compare gene expression in six different Oenothera harringtonii tissues from each of three linalool+ and linalool- plants. Results Three clusters of differentially expressed genes were enriched for terpene synthase activity: two were characterized by tissue-specific upregulation and one by upregulation only in plants with flowers that produce (R)-(−)-linalool. A molecular phylogeny of all terpene synthases identified two putative (R)-(−)-linalool synthase transcripts in Oenothera harringtonii, a single allele of which is found exclusively in linalool+ plants. Conclusions By using a naturally occurring polymorphism and comparing different tissues, we were able to identify candidate genes putatively involved in the biosynthesis of (R)-(−)-linalool. Expression of these genes in linalool- plants, while low, suggests a regulatory polymorphism, rather than a population-specific loss-of-function allele. Additional terpene biosynthesis-related genes that are up-regulated in plants that emit (R)-(−)-linalool may be associated with herbivore defense, suggesting a potential economy of scale between plant reproduction and defense. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08370-6.


Background
Floral fragrance is a complex trait that is typically composed of tens to hundreds of volatile organic compounds (VOCs) whose ecological roles include the attraction of pollinators, manipulation of pollinator behavior, and defense against herbivores [1][2][3][4][5][6][7]. Considerable qualitative and quantitative variation in floral fragrance has been documented, with over 1700 floral volatiles having been described from more than 900 angiosperm species [8]. In addition to substantial interspecific variation, floral fragrance has also been shown to differ intra-specifically [2,9]. The evolutionary and ecological mechanisms that maintain intraspecific variation in floral fragrance constitute an active area of research and have been attributed to differences in community composition, pollinator Bechen et al. BMC Genomics (2022) 23:124 preferences, phenotypic plasticity, and genetic drift [10][11][12][13].
The biosynthetic pathways that produce floral volatiles are increasingly well-characterized [14], and recent studies have used transcriptomics-based approaches to identify homologous genes from these pathways in nonmodel plant species [15][16][17][18]. Despite these advances [19,20], relatively few studies have identified the genetic mechanisms that impact how these pathways modify floral scent or how selection acts on the resulting intraspecific variation. Volatile terpenes comprise a major source of biosynthetic diversity in scents, primarily 10-carbon monoterpenes and 15-carbon sesquiterpenes [21]. Terpene synthase (TPS) enzymes, which are involved in the biosynthesis of these secondary metabolites, are encoded by a moderately large gene family in angiosperms [22]. The TPS genes exist in seven to eight subfamilies that show either lineage-specific (e.g., exclusively gymnosperm TPSs) or molecule/lineage-specific (e.g., monocot sesquiterpene TPSs) affinities. Despite the widespread characterization of the gene family and their products across plant lineages [22,23], the mechanisms maintaining intraspecific variation in the production of volatile terpenes remain poorly understood. The functional implications of intraspecific terpenoid variation are epitomized by the culinary herb Thymus vulgaris (Lamiaceae), whose distinct "chemotypes" differ in herbivore defense, frost sensitivity, and allelopathic interactions with neighboring plants [24,25]. Genetic crosses helped to document an epistatic network of mendelian loci responsible for volatile chemotypes in T. vulgaris [26,27]. However, beyond a few such model systems, the genetic controls of chemotype variation remain largely unknown, and this knowledge is needed to understand how conflicting selective forces shape chemical polymorphism. To date, there have been few opportunities to study the genetic underpinnings of volatile terpenoid polymorphism in floral scent.
The focal species of this study, Oenothera harringtonii (Onagraceae), is a night-blooming, self-incompatible annual herb [28] [29]. Other night-blooming species across the genus Oenothera have been found to emit (R)-(−)-linalool, including O. biennis (sect. Oenothera [30];), O. acutissima (sect. Lavauxia [31];), O. californica (sect. Anogra), O. cespitosa (sect. Pachylophus), O. howardii (sect. Megapterium), O. lavandulifolia (sect. Calylophus) and O. xylocarpa (sect. Contortae); Jogesh, Skogen, and Raguso unpublished data). Linalool in floral tissues can function as a pollinator attractant or as a defense compound [32][33][34][35], whereas linalool emitted by vegetative tissues has been implicated in both direct and indirect plant defenses [34]. Flowers of O. harringtonii are pollinated by the widespread hawkmoth species Hyles lineata and Manduca quinquemaculata [36]. Parallel studies suggest that these hawkmoths are likely attracted to the strong floral scent and high visual contrast of O. harringtonii flowers [37,38]. In addition to its role in plant reproduction, the primary pollinator H. lineata is also an herbivore of the plant, with its caterpillars feeding on leaves and floral tissues. Furthermore, the developing seeds of O. harringtonii are consumed by caterpillars of the moth genus Mompha [39], whose adults oviposit on flowers at dusk [40]. Considering these selective forces, it is likely that both floral scent and volatiles emitted by vegetative tissues play important roles in the complex fitness landscape of this species. However, understanding how these selective forces mold volatile terpene variation requires the identification of genes, including (R)-(−)linalool synthase and related terpene synthases, that are associated with scent polymorphism in O. harringtonii. In this way, identification and subsequent manipulation of the (S)-(+) linalool synthase gene in a wild tobacco, Nicotiana attenuata, led to novel insights about geographic variation in selective pressures and the importance of terpenoid volatile background in a tri-trophic/ plant defense context [41].
Within the family Onagraceae, the gene encoding (S)-(+)-linalool synthase was identified from Clarkia breweri and Clarkia concinna in the first study of biosynthetic genes responsible for floral scent [42]. However, (R)-(−)linalool synthase has not been identified in any evening primrose species to date, and only a single, partial TPS sequence has been identified thus far in Oenothera [43], one of the most species-rich lineages within the family [44]. In this study, we use RNA-Seq to identify differentially expressed genes (DEGs) correlated with linalool production in O. harringtonii by sequencing transcriptomes of six different tissues from three biological replicates of each of two chemotypes: (1) flowers that emit (R)-(−)-linalool (linalool+) and (2) flowers that do not emit (R)-(−)-linalool (linalool-). We use this information, in combination with a phylogenetic analysis of terpene synthases, to identify a candidate gene that may encode (R)-(−)-linalool synthase in O. harringtonii and identify an allele that is specific to the linalool+ individuals sampled here. This study includes the first transcriptomes assembled for O. harringtonii, providing critical genomic information to inform ongoing studies of intraspecific floral scent variation within this species. In addition, the preliminary identification of (R)-(−)-linalool synthase and other differentially expressed transcripts is a first step towards a broader, comparative analysis of the molecular evolution of floral scent among species of Oenothera that differ in mating system, pollination mode, life history strategy and, thus, the direction and intensity of natural selection (see [45]). As such, results of the present study provide a key foundational step in understanding the phylogenetic ubiquity of genes involved in the biosynthesis of (R)-(−)-linalool throughout flowering plants.

Floral fragrance analysis
Floral scent profiles were collected from replicated lin-alool+ and linalool-plants from which expressed transcripts were subsequently harvested; volatile chemical data are summarized in Table 1. Across all individuals, the floral scent was dominated by the monoterpene olefin (Ε)-β-ocimene, with small amounts of other monoterpenes ((Ζ)-β-ocimene and β-myrcene) and sesquiterpenes (β-caryophyllene and (Ε,Ε)-farnesol). The flowers of three individuals (A-C) emitted large amounts of (R)-(−)linalool (linalool+), whereas those from another three individuals did not emit any (Table 1; Fig. 1A). This pattern is consistent with phenotypic data from the source populations for these plants (Additional file 4), confirming that linalool chemotypes breed true in greenhousegrown plants (see Methods). Other volatiles (ocimene epoxide, methyl geranate, isophytol, α-humulene, and methyl benzoate) were emitted in small amounts by some individuals but not others, but the presence of these compounds varied independently of linalool chemotype (as they did in field-collected samples; Additional file 4) and are therefore likely due to unrelated factors.
Additional analyses were performed to confirm which of the two enantiomers of linalool were present in the floral scent. Chiral GC-MS analyses resulted in the separation of linalool enantiomers to baseline, with (R)-(−)linalool (14.07 min) eluting before (S)-(+)-linalool (14.14 min) under the conditions described (Fig. 1B). When the linalool peaks from linalool+ populations (Florence, Maverick, Baculite Mesa) were found to align with the retention time of the (R)-(−) enantiomer, but not with that of the (S)-(+) enantiomer or the linalool peak from Clarkia breweri, we re-injected an O. harringtonii headspace sample from the Florence population, to which a small amount of the (R)-(−) enantiomer was added. These analyses confirmed that flowers of O. harringtonii exclusively emit (R)-(−)-linalool (Fig. 1B).

Transcriptome sequencing, assembly, and annotation
RNA was successfully sequenced from six tissues (stamens, stigma/style, petals, hypanthium, bud, and leaf ) for all individuals, except for individuals D and F, for which bud and leaf samples were not obtained (see Methods). Leaf tissues were included to ensure that we could detect differentially expressed genes involved in the biosynthesis of floral volatiles that were not constitutively expressed in vegetative tissues. Across all 32 samples, 1.57 billion paired reads were sequenced (Additional file 1). On Table 1 Volatile floral terpenoids and their emission rates for greenhouse-grown plants (A-F) used for transcriptomics. Emission rates are given in μg per flower per hour. Volatile terpenoids are ordered by sub class (see Table S1) and retention time (min). The identity of all compounds except isophytol (*) and ocimene epoxide (*) was confirmed through co-chromatography and mass spectral comparison with synthetic standards

Differential expression and GO enrichment analysis
Principal components analysis was conducted to test whether it is appropriate to group biological replicates together during subsequent differential expression analysis. In general, samples of the same linalool chemotype grouped together, suggesting differences in gene expression between the chemotypes (Additional file 6). Additionally, samples clustered according to individual and tissue type. In total, edgeR identified 6577 differentially expressed genes (DEGs) among the six tissues and between linalool+ and linalool-individuals, with an overall pattern suggesting that tissue identity explains most of the observed differential expression (Fig. 2). All DEGs were grouped into 12 clusters by cutting the DEG dendrogram at 65% of its height (Fig. 3). Gene ontology (GO) enrichment analysis was performed for each cluster to identify categories of genes that are overrepresented (see Additional file 2 for a complete list of enriched GO terms for each cluster). Of the twelve clusters, two (clusters 10 and 11) were characterized by differential expression of genes between linalool+ and linalool-individuals, suggesting that genes in these clusters may be involved in linalool biosynthesis. Genes in cluster 10 (n = 43) were upregulated in all tissues from linalool+ individuals, whereas genes within cluster 11 (n = 46) were downregulated in all tissues from linalool+ individuals. In general, both of these clusters were characterized by several genes that lack annotations in the Trinotate report (26/43 and 38/46 genes for each cluster, respectively), either because they are unique or significantly differ from a homolog in the reference database.
To obtain an overview of the functions of genes within each cluster, we conducted GO enrichment. While these enrichment analyses may predict a possible function for these genes, we note that automatic annotation based on sequence homology is not proof of biological activity. Cluster 10 is enriched for terpene synthase activity (Table 2), and transcripts in this cluster with this higherlevel GO term (TPS activity) include those annotated as terpene synthase 10 (TRINITY_DN74292_c2_g1), γ-terpinene synthase/(+)-α-terpineol synthase (TRIN-ITY_DN62538_c3_g2) and santalene synthase (TRIN-ITY_DN76019_c5_g5). Terpene synthase 10 is known to produce (R)-(−)-linalool as a primary product in Arabidopsis thaliana, with β-myrcene and (E)-β-ocimene as minor products [48]. γ-terpinene and (+)-α-terpineol are monoterpenes, and although santalene is a sesquiterpene, terpene synthases responsible for its biosynthesis are classified with the TPS-b monoterpene synthases [49]. γ-terpinene and santalene are not present in the harringtonii, whereas α-terpineol is only present in small amounts, inconsistently, in some populations (e.g. FLO; Additional file 4). Thus, either the appropriate biosynthetic enzymes are not translated from these transcripts or, if they are, their eventual products are either metabolized into other secondary compounds, or they simply are not volatilized. However, we emphasize that this study was designed to predict genes involved only in (R)-(−)-linalool biosynthesis and emission, and that functional characterization of other TPS enzymes is required to validate their precise function and activity in O. harringtonii. Together, these results suggest that lin-alool+ individuals have the capacity to synthesize terpenes in all six tissues included here; therefore, differences between the linalool+ and linalool-chemotypes include terpene synthase expression in both floral and vegetative tissues. In addition to being common components of floral fragrance, terpenes are known to be involved in constitutive and induced defenses in both floral and non-floral tissues [4,50]. Terpenes are also involved in indirect defense, as they can attract members of the third trophic level (e.g., predators of herbivores) [51][52][53][54][55].
Overexpression of TPS genes across all tissues including leaves suggests that O. harringtonii populations emitting R-(−)-linalool from flowers may also be better-defended. Cluster 11 did not have any significant GO term enrichments, possibly because only a small number of the transcripts in the cluster were able to be annotated. The few genes in cluster 11 that do have annotations are not associated with terpene biosynthesis (Additional file 2). Two additional clusters contained significantly enriched GO terms related to terpene biosynthesis: clusters 1 (n = 232) and 6 (n = 542). Cluster 1 comprised genes upregulated in buds and petals but downregulated in leaves. Significant (FDR > 0.05) GO term enrichments for this cluster include isoprenoid biosynthetic process, terpenoid   biosynthetic process, and isopentenyl diphosphate biosynthetic process, among others (Table 2 and Additional file 2). Cluster 6 comprised genes upregulated in petals and downregulated in leaves. Significant GO term enrichments for this cluster include terpene synthase activity, sesquiterpene synthase activity, isoprenoid biosynthetic process, and terpenoid biosynthetic process (Table 2 and Additional file 2). Neither cluster showed substantial differentiation between the two linalool chemotypes, which, in combination with the enriched GO terms for each cluster, is an indication that the genes contained within those clusters may be involved in the biosynthesis of floral scent volatiles that are shared between the two chemotypes. Thus, the expression patterns of these two clusters show that certain terpenoid scent compounds are likely to be produced primarily in the petals of O. harringtonii, and that buds may be synthesizing floral scent or floral scent precursors 24-48 h prior to anthesis.

Phylogenomic analysis of Terpene synthases and (R)-(−)-linalool polymorphisms
To further characterize and identify terpene synthases in the transcriptome, regardless of their expression pattern, a profile Hidden Markov Model (pHMM) for terpene synthase (TPS) proteins was created based on a multiple sequence alignment of TPS sequences from van Schie et al. [23]. The pHMM successfully identified 40 TPS inferred protein sequences in the O. harringtonii reference assembly (Fig. 4). While it is likely that this is an overestimate due to the co-assembly of multiple individuals for the reference transcriptome, the 40 sequences were successfully placed into five of the seven TPS subfamilies with high confidence. According to the TPS subfamily circumscription in van Schie et al. [23], we found: (1) two distinct sequences of diterpene synthases (TPSc); (2) a single sequence of (S)-(+)-linalool synthase (TPSf ); (3) 11 sequences of a second group of diterpene synthases (TPSe) with seven distinct groups of genes (components arising from the Trinity assembly); (4) eight sequences of sesquiterpene synthases (TPSa) with six distinct components, and (5) 18 sequences of monoterpene synthases (TPSb). No sequences were reconstructed within the TPSd or TPSg subfamilies (Fig. 5); the former (TPSd) comprises exclusively gymnosperm terpene synthases and the latter (TPSg) comprises a group of acyclic monoterpene synthases related to ocimene synthases in Antirrhinum [23]. Genes belonging to TPSc, TPSe, and TPSf largely showed no differential expression between either tissue or chemotype (Fig. 4). Two genes within the TPSe clade belong to DEG cluster 3 (Figs. 3 and 4) and were upregulated in leaves, suggesting that they may play a role in defense against herbivory rather than floral fragrance. Some sesquiterpene synthase genes (the TPSa clade) were upregulated in floral tissues of linalool+ individuals (see cluster 6 in Figs. 3 and 4); however, these genes were highly expressed in the petals of both linalool+ and linalool-individuals. This matches fragrance data collected from source populations; both linalool+ and linaloolpopulations of O. harringtonii emit a range of sesquiterpenes that include β-caryophyllene, α-humulene, α-and β-farnesenes (Additional file 4); however, whether these transcripts represent enzymes that are directly involved in sesquiterpene biosynthesis requires functional characterization in vitro and in vivo, rather than simply phylogenomic-based annotation.
To-date, the most intensive genetic and physiological study of floral fragrance emission in Onagraceae has focused on a pair of sibling species in the genus Clarkia [42,43,56,57]. Most Clarkia species are bee-pollinated or autogamous [58], and range from unscented to mildly scented, with the exception of C. breweri, which produces a strong floral fragrance rich in (S)-(+)-linalool as a derived trait associated with a shift to hawkmoth pollination [57]. Pichersky et al. [56] found that (S)-(+)-linalool is primarily emitted from the petals and, to a lesser extent, the pistil of C. breweri flowers, whereas comparative analyses of the closest relative, C. concinna, revealed linalool production only in stigmatic tissues. The authors inferred that increased levels of linalool synthase (LIS) gene expression, expansion of tissues in which it is expressed, and enzymatic upregulation all contributed to the evolutionary gain-of-function of strong (S)-(+)-linalool emission from the flowers of C. breweri [29]. Our phylogenomic analysis placed a single sequence of (S)-(+)-linalool synthase (TPSf ) from O. harringtonii as sister to (S)-(+)-linalool synthase from Clarkia (Fig. 4). While we cannot confirm that this putative (S)-(+) -linalool synthase in O. harringtonii is non-functional, it is clear that no detectable (S)-(+)-linalool is emitted from the flowers (Fig. 1B; Table 1), the transcript is not differentially expressed, and it is expressed at a low level.
The O. harringtonii genes circumscribed as belonging to the TPSb (monoterpene synthases) clade can be divided into two subclades, annotated here as TPSb 1 and TPSb 2 (Fig. 4). While both clades are strongly supported, their relationships to TPSb genes from other species are not well resolved. Genes from O. harringtonii within the TPSb 1 clade show some evidence of upregulation in floral tissues and bud (clusters 1 and 6) with no evidence of chemotype-related differential expression. As the two chemotypes differ primarily in the presence of a single monoterpene, (R)-(−)-linalool, we would expect that only one of the TPSb subclades would include a gene that is upregulated in linalool+ floral tissues. Two contigs from clade TPSb 2 belong to cluster 10, which showed strong differential expression related to chemotype (see rows annotated as cluster 10 in Fig. 4). Although cluster 10 includes genes that are differentially expressed (upregulated in linalool+ individuals) for all tissues including leaves, the signal for upregulation of these two transcripts in leaves is relatively weak. In fact, we detect no (R)-(−)-linalool emitted from the leaves of individuals with linalool+ flowers (Additional file 3). The strongest signal of upregulation is detected from petal tissue; floral organ-specific emission data confirms that the wholeflower emission of (R)-(−)-linalool is driven largely by petal tissue (Additional file 9). Furthermore, the TPSb 2 subclade is sister (with moderate bootstrap support of 68) to a clade that includes other linalool synthases, including the (R)-(−)-linalool synthase gene from Artemesia annua [59]. When the phylogeny, differential expression, and floral fragrance data are taken together, they strongly suggest that the two transcripts belonging to  Accounting for possible transcriptome assembly artifacts, we examined the three transcripts from the clade containing the candidate (R)-(−)-linalool synthases (TRINITY_DN62538_c3_g1_i1, TRINITY_DN62538_ c3_g2_i2, and TRINITY_DN74292_c2_g1_i11) and determined that they each represent a partial coding sequence; the three transcripts were subsequently merged to form a putative full-length transcript. A BLAST search of the merged transcript revealed sequence similarity to (R)-(−)-linalool synthase in lemon myrtle, Backhousia citriodora (Myrtaceae; GenBank accession AB438045.1). However, the O. harringtonii transcript could only be aligned over 52% of the lemon myrtle transcript indicating that the merged O. harringtonii transcript may either not represent a full-length transcript, is shorter than the coding sequence in B. citriodora, or is significantly diverged in sequence (sequence similarity to the B. citriodora CDS is 64%). A full-length protein sequence annotated from a 13 kb genomic contig (Additional file 7), when aligned with the B. citriodora sequence (Additional file 8), shows that there is significant divergence at the 5′ end that likely accounts for this discrepancy. As RNA-Seq has been shown to reliably predict genes that are significantly associated with a treatment [60][61][62][63] and given the overwhelming difference between the abundance of reads mapped to the candidate (R)-(−)-linalool synthase transcripts in linalool+ and linalool-individuals, we suggest that RT-qPCR validation is unnecessary in this case. Future studies that focus exclusively on the function of this particular gene in Oenothera in more detail may benefit from the sequences described here, as a template for primer design or as a candidate locus in association studies.
To investigate if there are differences in the coding region of (R)-(−)-linalool synthase associated with either chemotype, we compared SNPs between individuals. Calling SNPs for each individual revealed that all lin-alool+ individuals are homozygous and share the same haplotype which matched the reference (merged consensus) sequence, with the exception of two SNPs at the 5′ and 3′ ends of the coding region (Fig. 5), which may be due to sequencing or assembly error (regions where read depth decreases). Conversely, all linalool-individuals are heterozygous, and do not share the same haplotype as each other or the linalool+ individuals. Due to the high level of (R)-(−)-linalool synthase expression in linalool+ plants, the sequencing depth for linalool+ individuals appears to be equal in Fig. 5, likely an artifact of using deduplicated reads. As expression is low in the linalool-plants, the deduplication process likely removed few, if any, reads, reflected in the varying depths of coverage for these individuals (Fig. 5). Sequences reconstructed from the linalool-individuals are characterized by between eight and fifteen SNPs compared to the reference sequence. The presence of a single allele in all linalool+ individuals included here is suggestive of a selective sweep in populations dominated by linalool+ plants, but much more extensive sampling of individuals and genomic regions is needed to better characterize the history of selection. Phasing of the alleles for linaloolindividuals, and a reconstruction of the terpene synthase gene tree with additional sequences added (Additional file 5) confirmed that the sequences were indeed alleles rather than paralogs.
Terpenes are known to be regulated both transcriptionally and post-transcriptionally. In Arabidopsis, for example, differences in the amount of (E)-β-ocimene and (E,E)-α-farnesene emitted between two ecotypes is controlled by allelic variation of two terpene synthase genes, whereby one of the two genes in each ecotype produces a non-functional transcript [64]. Transcription factors regulating a variety of terpenes have also been identified; in cotton WRKY1 positively regulates (+)-ẟ-cadinene synthase (CAD1) [65], and in kiwifruit, NAC transcription factors were found to regulate the expression of TPS1 in ripe fruit [66]. Post-transcriptional regulation includes the action of miRNAs, which have been identified in plants such as Ocimum basilicum [67] and Ferula gummosa [68]. The existence of SNPs between transcripts from each chemotype may suggest differences in functionality of each linalool synthase; however, few of the SNPs are shared among linalool-plants, such that the regulation of (R)-(−)-linalool synthase in O. harringtonii may be associated only with upregulation and the allele found in linalool+ individuals. Further sequence analysis and more extensive sampling is required to determine whether this apparent allelic diversity shows strong evidence of population structure. Additionally, a highquality draft genome sequence for O. harringtonii would allow us to better detect variation that may be associated with regulatory regions or copy number variation.

Conclusions
This study presents the first transcriptome assembly and analysis of differential expression in Oenothera harringtonii, a species with intraspecific variation for the emission of (R)-(−)-linalool in its floral fragrance. In this study, RNA-Seq was used to compare transcriptomes of six different Oenothera harringtonii tissues from six individuals (three linalool+, three linalool-). Six thousand five hundred seventy-seven differentially expressed genes were identified, and these DEGs were sorted into 12 clusters. Three clusters were significantly enriched for GO terms related to floral fragrance biosynthesis. Two of these clusters showed upregulation in petals, suggesting that floral scent biosynthesis is mostly localized to petals, a result that was confirmed by measuring tissuespecific emission patterns (Additional file 9). The third cluster showed upregulation in all linalool+ tissues and contained a significantly enriched GO term for terpene synthase. A phylogenetic analysis of all terpene synthases identified using a profile Hidden Markov Model classified 40 O. harringtonii transcripts into five terpene synthase subfamilies. Synthesizing the phylogenetic, differential expression, and floral fragrance data, we identified a transcript that likely encodes a partial (R)-(−)-linalool synthase, representing the first in silico identification of a candidate (R)-(−)-linalool synthase gene in the flowering plant family Onagraceae.

Plant material
Parental individuals were grown from seed collected in natural populations of Oenothera harringtonii from southeastern Colorado for which floral volatile data (Additional file 4) indicated that floral scent was polymorphic among the sampled populations: Burnt Mill Road (Pueblo County; linalool+), Florence (Fremont County; linalool+), Bloom (Las Animas County; linalool-), and David Canyon (Otero County; linalool-) (see [69] for locality data). As part of ongoing work on this species, the following controlled crosses within chemotype were conducted in the greenhouse at the Chicago Botanic Garden (Glencoe, Illinois, USA) in 2008 and 2009: both parents from Burnt Mill Road (individuals A and B), both parents from Florence (individual C), one parent from David Canyon and the other from Bloom (individual D), and both parents from David Canyon (individuals E and F). The resultant offspring seeds were germinated and grown in the greenhouse until buds were initiated, after which they were moved to a growth chamber (Conviron model GR48, Winnipeg, MB, Canada) where they typically flowered within 1 week. While anthesis in natural populations occurs at dusk [69], to facilitate daytime floral fragrance sampling, the growth chamber settings were configured such that dusk began at 9:30, night was from 11:00 to 20:30, dawn began at 20:30, and daytime was from 21:30 to 9:30.

Floral fragrance collection
Based on hundreds of previous analyses of floral scent from across the range of O. harringtonii (Additional file 4), the population-level chemotypes are stable and consistent; floral scent analyses of the plants grown for this transcriptomic study were carried out to confirm the prediction of chemotypes for the plant material selected for RNA-Seq. Floral fragrance was collected prior to RNA extraction to verify the linalool chemotype (lin-alool+ or linalool-) of each of the six individuals. Each flower was enclosed in a nylon resin oven bag (Reynolds, Lake Forest, IL) that was cut to 20.3 cm × 14.8 cm and resealed. Filters, constructed from a glass Pasteur pipette containing 10 mg of adsorbent material (Porapak Q 80/100 mesh size, Sigma-Aldrich, St. Louis, MO) tamped between quartz wool, were used to trap floral volatiles during the first hour after anthesis at dusk. The bag was secured around the flower's hypanthium and the filter with twist-ties, ensuring that there were no vegetative structures enclosed within the bag. The filter tip was positioned as close as possible to the floral center without damaging it and attached via Teflon tubing to a vacuum pump (PAS-500, Spectrex, Redwood City, CA). Pumps were run at a flow rate of 200 ml air per minute for 1 h; filters were then eluted with 200 μl of high purity hexane (GC2, Burdick & Jackson, Muskegon, MI). Two types of controls were used for each floral fragrance collection: an ambient control filter attached to a bag without a flower, and a clean storage control filter that was eluted without having been used for fragrance collection. Eluted samples were stored at − 20 °C prior to being analyzed using gas chromatography-mass spectrometry (GC-MS) at Cornell University.

Floral scent analysis by GC-MS
Prior to analysis, hexane-eluted floral headspace samples were concentrated to a uniform volume of 50 μl using gaseous N 2 , and 5 μl of 0.03% toluene in hexane (23 ng) was added as an internal standard. One μl aliquots of each sample were injected into a Shimadzu GC-17A gas chromatograph equipped with a Shimadzu QP5000 quadrupole electron ionization (EI) mass spectrometer (Shimadzu Scientific Instruments, Columbia, MD) as a detector. All analyses were made using splitless injections on a polar GC column (0.25 mm diameter, 30 m length, 0.25 μm film thickness; EC WAX; W.R. Grace & Co., Columbia, MD) using ultra high purity (99.999%) helium as a mobile phase (split ratio 12:1, constant flow of 1 ml/ min). The gas chromatograph temperature and pressure parameters were optimized to resolve linalool and other floral volatiles to baseline with a total run time of 19 min per sample (240 °C injection port temperature, 260 °C detector temperature, 40 °C initial temperature, two-min hold time increased at 15 °C per min to 260 °C, hold time 2.38 min). EI mass spectra (70 eV) were collected from m/z 40-350 (daltons) at a detector voltage of 70 eV, with scan speed of 1000 and a scan interval of 0.29 s.
Volatile compounds were tentatively identified using computerized mass spectral libraries (Wiley Registry of Mass Spectral Data, National Institute of Standards and Technology, and [70] (> 120,000 mass spectra)). Peak areas were integrated using Shimadzu's GCMSsolutions software and normalized for slight differences in final sample volume using the internal standard. Peak areas were quantified by comparison with five-point (log scale) dose-response standard curves generated using serial dilutions from 1 to 0.0001 mg/ml of authentic standards for (E)-β-ocimene, racemic linalool, β-caryophyllene, geraniol, and (E,E)-farnesol. Emission rates were expressed as ng per flower, per hour (Table 1).
Additional chromatographic analyses were performed to determine which of the two known enantiomers of linalool is present in the floral headspace of Oenothera harringtonii. The third carbon of linalool (3,7-dimethylocta-1,6-dien-3-ol; C 10 H 18 O) is a chiral center, and both (S)-(+) and (R)-(−) enantiomers may be present in floral volatile blends [29,71]. A chiral GC column (Cyclosil-B, Agilent Technologies, Inc.; 30 m long, 0.25 mm inner diameter, 0.25um film thickness) was used with the following temperature program: splitless injection at 240 °C, column temperature increase from 40 to 240 °C at 10 °C per min. Under these conditions, the (R)-(−) and (S)-(+) enantiomers of linalool were distinguished by co-injection of two authentic standards, racemic linalool (Aldrich L260-2) and (R)-(−)-linalool (Aldrich 62,139), along with a one-hour headspace sample taken from a single flower of Clarkia breweri, previously shown to exclusively emit (S)-(+)-linalool [72]. Floral headspace samples from lin-alool+ populations of O. harringtonii (Florence) were chromatographed using the same protocol, comparing retention times for the leading edge of the linalool peak with those of the authentic standards.

RNA extraction
RNA was extracted from the same flowers used in fragrance analysis. While still attached to the plant, each flower was dissected into four parts: (1) stamens, (2) petals, (3) stigma and style, and (4) hypanthium and sepals. Additionally, RNA was extracted from flower buds 24-48 h before anthesis and from young leaf tissues from the center of the rosette (excluding individuals D and F, two linalool-individuals for which bud and leaf samples could not be obtained due to insufficient growth post initial sampling). After tissues were excised from the plant, they were immediately placed into either 15 ml conical tubes (petals, hypanthium, bud) or 2 ml microcentrifuge tubes (stigma and style, stamens, leaf ) and submerged in liquid nitrogen.
Frozen tissue was ground into a fine powder using either a mortar and pestle (petals, hypanthium, bud) or tissue homogenizer with a 3 mm stainless steel ball bearing inside the microcentrifuge tube (stigma and style, stamens, leaf; Talboys High Throughput Homogenizer  930,145

GO enrichment analysis and Terpene biosynthesis gene identification
Gene ontology (GO) enrichment analysis, performed with scripts provided in the Trinity software package, was used to identify gene categories that are overrepresented in each cluster relative to the assembly (Trinotate/util/extract_GO_assignments_from_Trinotate_xls. pl --Trinotate_xls annotation_report.xls -G --include_ ancestral_terms > go_annotations.txt \ Trinity/Analysis/ DifferentialExpression/run_GOseq.pl --factor_labeling factor_labeling.txt --GO assignments go_annotations.txt --lengths gene.lengths.txt). Phylogenetic inference was used to identify and classify terpene synthases (TPS) in the reference transcriptome assembly. To classify putative terpene synthases into the seven described subfamilies a-f [23,42,93,94], all of the sequences included in van Schie et al. [23] were downloaded and verified by reproducing a tree that circumscribed the seven TPS subfamilies. The verified alignment was then used to create a profile Hidden Markov Model using HMMER v3.1b (hmmer. org). The O. harringtonii reference assembly was searched for sequences explained by the pHMM using hmmscan (e-value 1e-10). All putative terpene synthesis protein sequences were aligned using MAFFT v7.310 [95] (−-maxiterate 1000 --localpair), and a phylogeny was inferred using RAxML v8.2.11 [96] with the PROT-GAMMAWAG model of sequence evolution and 100 fast bootstrap replicates.

SNP analysis
To determine the distribution of SNPs in the putative (R)-(−)-linalool synthase coding sequence, a putative full length transcript was created by merging partial transcripts identified by the phylogenetic and DEG analyses. Three transcripts were merged that formed a clade (TRINITY_DN62538_c3_g1_i1, TRINITY_DN62538_ c3_g2_i2, and TRINITY_DN74292_c2_g1_i11) and could be aligned with only a single mismatch. A web BLAST search against the GenBank database of nonredundant proteins resulted in alignment over approximately 52% of the length of (R)-(−)-linalool synthase from the lemon myrtle Backhousia citriodora (Myrtaceae), suggesting that the consensus sequence produced by merging the transcripts may not represent a full length transcript of (R)-(−)-linalool synthase. However, the merged O. harringtonii (R)-(−)-linalool synthase transcript and the lemon myrtle transcript (GenBank accession AB438045) were only 64% identical. SNPs were then identified by mapping reads to the merged transcript from each RNA-Seq library. All reads identified in the DEG analysis as mapping to the three unmerged transcripts were extracted and variants were discovered using the GATK best practices workflow for germline SNPs (https:// softw are. broad insti tute. org/ gatk/ best-pract ices/ workfl ow? id= 11145). Briefly, reads were re-mapped to the merged transcript and duplicate mappings were marked and removed before variants were called using HaplotypeCaller in GATK. Phasing of alleles within individuals was done with WhatsHap [97], after which two alternate phased haplotypes were extracted