Skip to main content

De novo transcriptome analysis and comparative expression profiling of genes associated with the taste-modifying protein neoculin in Curculigo latifolia and Curculigo capitulata fruits

Abstract

Background

Curculigo latifolia is a perennial plant endogenous to Southeast Asia whose fruits contain the taste-modifying protein neoculin, which binds to sweet receptors and makes sour fruits taste sweet. Although similar to snowdrop (Galanthus nivalis) agglutinin (GNA), which contains mannose-binding sites in its sequence and 3D structure, neoculin lacks such sites and has no lectin activity. Whether the fruits of C. latifolia and other Curculigo plants contain neoculin and/or GNA family members was unclear.

Results

Through de novo RNA-seq assembly of the fruits of C. latifolia and the related C. capitulata and detailed analysis of the expression patterns of neoculin and neoculin-like genes in both species, we assembled 85,697 transcripts from C. latifolia and 76,775 from C. capitulata using Trinity and annotated them using public databases. We identified 70,371 unigenes in C. latifolia and 63,704 in C. capitulata. In total, 38.6% of unigenes from C. latifolia and 42.6% from C. capitulata shared high similarity between the two species. We identified ten neoculin-related transcripts in C. latifolia and 15 in C. capitulata, encoding both the basic and acidic subunits of neoculin in both plants. We aligned these 25 transcripts and generated a phylogenetic tree. Many orthologs in the two species shared high similarity, despite the low number of common genes, suggesting that these genes likely existed before the two species diverged. The relative expression levels of these genes differed considerably between the two species: the transcripts per million (TPM) values of neoculin genes were 60 times higher in C. latifolia than in C. capitulata, whereas those of GNA family members were 15,000 times lower in C. latifolia than in C. capitulata.

Conclusions

The genetic diversity of neoculin-related genes strongly suggests that neoculin genes underwent duplication during evolution. The marked differences in their expression profiles between C. latifolia and C. capitulata may be due to mutations in regions involved in transcriptional regulation. Comprehensive analysis of the genes expressed in the fruits of these two Curculigo species helped elucidate the origin of neoculin at the molecular level.

Background

Curculigo latifolia (Hypoxidaceae family, formerly classified in the Liliaceae family) is a perennial plant found in Southeast Asia, especially the Malay peninsula [1, 2]. According to the Royal Botanic Gardens, Kew, there are 27 species of Curculigo [3]. The genetic diversity and morphology of Curculigo have long been of interest [4,5,6,7]. C. latifolia and C. capitulata were previously reclassified as members of the Molineria genus, but recent discussions have suggested that they should be returned to the Curculigo genus. Here, we use the traditional name, Curculigo.

C. latifolia and C. capitulata have a similar vegetative appearance (Fig. 1), but differ in their flower and fruit morphology. In addition, C. capitulata is more widely distributed than C. latifolia. Both species are diploids (2n = 18; x = 9) [8]. C. latifolia is self-incompatible [9], but C. capitulata plants from various botanical gardens in Japan have not been successfully crossed. So, it is unknown whether C. capitulata is self-compatible or self-incompatible. The flowers, roots, stems, and leaves of Curculigo plants have traditionally been used as medicines [10,11,12,13,14,15]. Notably, C. latifolia fruits, but not those of C. capitulata, produce a taste-modifying protein, neoculin, that makes sour-tasting foods or water taste sweet [1, 16,17,18].

Fig. 1
figure1

Photographs of Curculigo latifolia and Curculigo capitulata. Curculigo latifolia (ac) and C. capitulata (df) in the greenhouse at the Yamashina Botanical Research Institute. b and e Inflorescences; c and f fruits. All photographs are our own taken by Satoshi Okubo

Neoculin itself has a sweet taste and is 550 times sweeter than sucrose on the percentage sucrose equivalent scale [19, 20]. Furthermore, neoculin has a taste-modifying activity that converts sourness to sweetness: for example, the sour taste of lemons is changed to a sweet orange taste. Moreover, the presence of neoculin induces sweetness in drinking water, and some organic acids taste sweet when consumed after neoculin [21]. Neoculin is perceived by the human sweet taste receptor T1R2-T1R3, a member of the G-protein-coupled receptor family [22]. Neoculin consists of two subunits that form a heterodimer: the neoculin basic subunit (NBS), also called curculin [16], and the neoculin acidic subunit (NAS) [18, 23]. NBS is a 11-kDa peptide consisting of 114 amino acid residues [16, 24], while NAS has a molecular mass of 13 kDa and 113 residues. The two subunits share 77% identity at the protein level [18]. Several essential amino acids that are responsible for the taste-modifying properties of neoculin have been identified: His-11 in NBS is responsible for the pH-dependent taste-modifying activity of neoculin [25], and Arg-48, Tyr-65, Val-72, and Phe-94 function in the binding and activation of human sweet taste receptors [26]. Changes in the tertiary structure of the subunits at these residues are thought to contribute to the taste-modifying properties of neoculin [27, 28].

Lectins are proteins that recognize and bind to specific carbohydrate structures [29, 30]. Plant lectins are classified into 12 families. Neoculin NBS and NAS are similar in protein sequence and 3-dimensional (3D) structure to the GNA (Galanthus nivalis agglutinin) family of lectins, which are present in bulbs such as snowdrop (Galanthus nivalis) and daffodil (Narcissus pseudonarcissus) and are thought to function as defense or storage proteins [31,32,33]. However, NBS and NAS lack a mannose-binding site (MBS) and do not have lectin activity [34,35,36]. Furthermore, whereas GNA family members in plants such as snowdrop contain one disulfide bond, which functions in intra-subunit bonding, neoculin forms both two intra-subunit bonds and two inter-subunit bonds between NBS and NAS [32].

The fruit of C. latifolia contains 1.3 mg neoculin per fruit [37] or 1.3 mg per one gram of fresh pulp [38]. This is thought to be considerably higher than the levels of total proteins in typical edible fruits [39]. Although the taste-modifying activity of neoculin is well-known, its biological role in C. latifolia is unknown. In addition, as neoculin is not a lectin, it was not clear which lectins are expressed in C. latifolia fruits, especially lectins of the GNA family. Finally, whether other Curculigo species also accumulate neoculin or neoculin-like proteins is unknown.

Here, we compared the gene expression profiles in the fruits of C. latifolia and C. capitulata by transcriptome deep sequencing (RNA-seq). The aim of this study was to comprehensively analyze the two species from the viewpoint of amino acid sequences and gene expression levels to shed light on the origins of neoculin.

Results

De novo RNA-seq assembly from C. latifolia and C. capitulata fruits

We sequenced cDNA libraries from C. latifolia and C. capitulata using the Illumina HiSeq 2500 platform. To analyze the data, we filtered out raw reads with average quality values < 20, reads with < 50 nucleotides, and reads with ambiguous ‘N’ bases. After trimming reads for adapter sequences and filtering, we obtained 44,396,896 reads from C. latifolia and 43,863,400 from C. capitulata. We then assembled high-quality reads from C. latifolia and C. capitulata into 85,697 and 76,775 contigs with a mean length of 775 bp and 744 bp, respectively, using Trinity 2.11. The distribution of transcript lengths and transcripts per million (TPM) values are shown in Additional files 1 and 2. The N50 values for C. latifolia and C. capitulata transcripts were 1324 and 1205, respectively (Table 1). Unigene clustering using CD-Hit revealed 70,371 unigenes in C. latifolia and 63,704 in C. capitulata (Table 1).

Table 1 Overview of de novo RNA-seq assembly from C. latifolia and C. capitulata fruits

The gene repertoires of the two Curculigo species fitting the monocots

Low annotation rate of the transcripts: To gather functional information about the transcripts identified from de novo assembly, we aligned all transcripts against nucleotide sequences from various protein databases, including the nonredundant protein (NR) database at the National Center for Biotechnology Information (NCBI), RefSeq, UniProt/Swiss-Prot, Clusters of Orthologous Groups of proteins (COG), the rice (Oryza sativa) genome (Os-Nipponbare-Reference-IRGSP-1.0, Assembly: GCF_001433935.1), and the Arabidopsis (Arabidopsis thaliana) genome (Assembly: GCF_000001735.4) and selected the top hits from these queries. We obtained annotations for 38,433 out of 85,697 transcripts (44.8%) in C. latifolia and 40,554 out of 76,775 transcripts (52.8%) in C. capitulata with a threshold of 1e− 10 by performing a Basic Local Alignment Search Tool search with our in silico-translated transcripts against protein databases (BLASTx) using the NR, RefSeq, UniProt, and COG databases and the proteomes of rice and Arabidopsis. All annotations are listed in Additional file 3. The number of annotated transcripts for each database is listed in Table 2. The low annotation rate suggests that the two Curculigo species are significantly different from classical model plant systems that drive much of the information stored in public databases.

Table 2 Number of functional annotations of transcripts from C. latifolia and C. capitulata fruits

Conservation across monocots: After BLASTx searches with the C. latifolia and C. capitulata transcripts against the NR database, we determined the extent of gene conservation across plant species by running Blast2GO [40]. We estimated the similarity of the two Curculigo species to various plant species by counting the number of hits from each species obtained by BLAST searches (Fig. 2). The top six species displaying the highest homology with C. latifolia and C. capitulata transcripts were monocots, like Curculigo, supporting the view that the assembled Curculigo genes are highly similar to known genes from other monocots. The top six species sharing the highest similarity with C. latifolia and C. capitulata were identical in terms of both species and rank order.

Fig. 2
figure2

The de novo assembled C. latifolia and C. capitulata transcriptomes reveal high similarity to known monocot genes. The percentage of genes with matches in C. latifolia (outer circle) and C. capitulata (inner circle) was obtained from the results of BLAST search against the NR database. The top six most highly homologous species were monocot, like Curculigo

Expression of functionally similar genes between the two species: Using the COG database, we classified 11,875 transcripts from C. latifolia and 12,448 from C. capitulata into functional categories (Fig. 3). We observed no significant differences between the two species, which supports the notion that these two species have functionally similar genes.

Fig. 3
figure3

C. latifolia and C. capitulata have functionally similar genes. Functional classification of transcripts was performed using the COG database. In total, 11,875 (C. latifolia) and 12,448 (C. capitulata) transcripts were grouped into 26 COG categories (A to Z). No significant differences were observed between the two species

We also analyzed the functions of the assembled transcripts via Gene Ontology (GO) analysis using the rice genome annotation (Additional file 4). Again, no significant differences were observed between the two species. The results also suggested that the repertoires of genes from the two species are similar to those of better-known species.

The genes with high similarity between C. latifolia and C. capitulata fruits are less than half of the genes

Using the unigene sequences, we analyzed the similarity of between C. latifolia and C. capitulata genes. We performed BLAST searches using each transcript from one species as the query sequence against all transcripts from the other species with a threshold E-value of 1e− 5 or less and selected the reciprocal best hits. We defined unigenes with high similarity between the two species as common genes and unigenes with low similarity between the species, or present in only one species, as unique genes. In total, we deemed 38.6% (27,155 out of 70,371) of genes in C. latifolia and 42.6% (27,155 out of 63,704) of genes in C. capitulata to be common genes (Fig. 4). The relatively small number of common genes suggests that a long time has passed since the divergence of these species, which is consistent with results of lineage analysis based on plastid DNA from Hypoxidaceae family members. Indeed, although the Curculigo genus constitutes a single clade, C. latifolia and C. capitulata are not the most closely related species within this clade [5].

Fig. 4
figure4

The majority of unigenes from C. latifolia and C. capitulata correspond to unique genes with low similarity. Number of unigenes based on sequence similarity between C. latifolia and C. capitulata fruits. The number of highly similar unigenes that are common (L-common: common genes of C. latifolia; C-common: common genes of C. capitulata) and unigenes with low similarity, which are thus unique genes (L-unique: unique genes of C. latifolia; C-unique: unique genes of C. capitulata)

Next, we investigated the proportion of annotated genes in these species using the COG, RefSeq, UniProt, and NR databases and the genomes of rice and Arabidopsis (shown in Table 2). Among the common genes, 17,337 and 17,199 genes were annotated (63.8 and 63.3% of common genes) in C. latifolia and C. capitulata, respectively. By contrast, there were 11,718 annotated unique genes (27.1% of unique genes) among genes found only in C. latifolia and 14,848 (40.6% of unique genes) among those found only in C. capitulata. Thus, the annotation rate was higher for common genes than for unique genes, despite the smaller number of common genes. One possible explanation for this observation is that many of the genes common to both species may also be common genes in other model plant species that are highly represented in the databases employed.

We then compared the expression profiles of 27,155 common genes between C. latifolia and C. capitulata. Although the sequences of the corresponding genes in C. latifolia and C. capitulata were similar, their expression profiles were not necessarily equivalent. Nonetheless, only 111 out of the 27,155 common genes had TPM ratios ≥50 (Table 3). Of these 111 genes, five were neoculin-related genes, indicating that the expression profiles of at least some neoculin-related genes differ significantly between the two species.

Table 3 Comparison of the expression profiles of C. latifolia and C. capitulata

Lectin genes expressed in C. latifolia and C. capitulata fruits

We previously demonstrated that C. latifolia fruits contain a taste-modifying protein consisting of a NBS-NAS heterodimer that is similar to lectins in the GNA family. We therefore investigated the number of lectin genes expressed in the fruits of C. latifolia and C. capitulata that were categorized into each of the 12 lectin families to better understand the general outline of the GNA gene family in these species. To determine the number of lectin genes, we performed tBLASTN searches against all transcripts in each species using the sequences of 12 representative lectins as query [41] (Table 4). In both species, the largest lectin family was the GNA family, which includes the neoculin (NBS and NAS) genes. Ten of the 45 lectin genes in C. latifolia and 13 of the 49 lectin genes in C. capitulata belonged to the GNA family. Thus, we analyzed the many GNA family genes in these species, including the neoculin genes, in more detail.

Table 4 Number of predicted lectin genes using tBLASTN in C. latifolia and C. capitulata fruits

Analysis of GNA family and neoculin-related transcripts

We constructed a phylogenetic tree using the deduced protein sequences from 17 transcripts of well-known GNA family members and 25 full-length neoculin-related transcripts from Curculigo (10 from C. latifolia and 15 from C. capitulata; Fig. 5); the method used for sequence selection is shown in Additional file 5. The TPM values (calculated by RSEM) are listed after the transcript IDs. An alignment of all sequences is shown in Additional file 6. The C. latifolia transcript L_16562_c0_g1_i1 was a good match for NBS, while L_16562_c0_g1_i2 was a good match for NAS, except for one amino acid substitution (Additional file 7); these transcripts will be referred to as NBS and NAS hereafter. The predicted proteins derived from neoculin-related transcripts formed a distinct group separate from known GNA family members. Neoculin-like sequences formed one group that included NBS and NAS (named the ‘neoculin group’), as well as two other large groups (group 1 and group 2) (Fig. 5). In addition to NBS and NAS, the neoculin group also included proteins whose transcripts were highly expressed (C_9931_c0_g1_i1) and that presented the conserved amino acid residues critical for binding mannose (and thus have the potential for lectin activity). In addition, each transcript had an ortholog in both Curculigo species.

Fig. 5
figure5

Phylogenetic analysis of neoculin-related transcripts uncovers the contrasting expression levels in the orthologs. Neoculin-related and GNA family members were aligned using ClustalX. The phylogenetic tree was constructed using the neighbor-joining method (bootstrap = 1000). De novo transcriptome transcript IDs for C. latifolia and C. capitulata are shown in purple and orange, respectively. L_16562_c0_g1_i1 and L_16562_c0_g1_i2 of C. latifolia correspond to NBS and NAS, respectively (see Additional file 7). Transcript per million (TPM) values are listed to the right of the transcript IDs. Transcripts from the two species encoding highly similar protein sequences are shown in pairs. Transcripts sharing high similarity with those of NBS and NAS are referred to as the neoculin group (indicated by the red frame). Groups of other highly similar predicted proteins are shown in groups 1 and 2. The vertical lines to the right of the TMP value indicate orthologous pairs in C. latifolia and C. capitulata. The sequences and species of origin of the selected GNA family members are as follows, with the structure name from the Protein Data Bank given in parentheses: ASA, Allium sativum (1BWU); GNA, Galanthus nivalis (1MSA); and NPL, Narcissus pseudonarcissus (1NPL). Other sequences were obtained from GenBank: PRA, Polygonatum roseum (AY899824); PMA, Polygonatum multiflorum (U44775); CMA, Clivia miniata (L16512); ZCA, Zephyranthes candida (AF527385); AAA, Allium ascalonicum (L12172); ACA, Allium cepa (AY376826); AUA, Allium ursinum (U68531); THC, Tulipa hybrid cultivar (U23043); ZOA, Zingiber officinale (AY657021); ACO, Ananas comosus (AY098512); AKA, Amorphophallus konjac (AY191004); DPA, Dioscorea polystachya (AB178475); CHC, Cymbidium hybrid cultivar (U02516); and EHA, Epipactis helleborine (U02515)

Many highly expressed transcripts belonged to group 1 (L_22219_c0_g1_i1 [TPM: 7600]; C_18595_c_g1_i1 [TPM: 2300]; C_9454_c0_g1_i1 [TPM: 2000]). Although these highly expressed transcripts encode proteins that are very similar to mannose-binding lectins, they are not mannose-binding lectins, as they lack the conserved and essential amino acid residues that form the mannose-binding sites. At this time, we do not know their physiological functions or the reason for their high expression. Predicted proteins encoded by group 2 transcripts were also relatively close to the lectins Polygonatum multiflorum agglutinin (PMA) and Polygonatum roseum agglutinin (PRA) from the Polygonatum genus. Unlike in group 1, there were no highly expressed transcripts in this group.

In each group, we detected neoculin-related orthologous transcripts with high similarity between C. latifolia and C. capitulata. The existence of many orthologs in each species, combined with the presence of relatively few common genes (comprising only approximately 40% of all transcripts in both species; Fig. 4), is noteworthy. We infer that these orthologs probably existed before the divergence of these two species, whereas their amino acid differences probably arose afterwards. Genetic diversity is beneficial for plants, including Curculigo, due to their lack of mobility to increase population survival against multiple stresses. It would be interesting to determine whether Curculigo plants other than C. latifolia and C. capitulata contain neoculin-related genes, especially genes in the neoculin group.

Within the neoculin group, we identified transcripts encoding proteins with high similarity to NBS and NAS in both C. latifolia and C. capitulata. Notably, although the corresponding NBS and NAS genes were highly expressed in C. latifolia, their C. capitulata orthologs were only weakly expressed (C_16324_c0_g1_i1 and C_16324_c0_g1_i2). The TPM values for NBS and NAS genes in C. latifolia were approximately the same, with 650 and 620 TPMs, respectively. This result is in agreement with the finding that their encoded proteins form a heterodimer [18]. Although C_9931_c0_g1_i1 was highly expressed in C. capitulata, with a TPM value of 15,000 (the fifth highest expression level among all C. capitulata transcripts), its C. latifolia ortholog (L_307_c0_g1_i1 and L_307_c0_g2_i1) was expressed at a very low level. In order to verify the results of RNA-seq, qRT-PCR analyses for the genes of the neoculin group in two species were performed (Additional files 8 and 9). Then, we compared the expression levels using a ubiquitin gene of each species as a reference gene. In C. latifolia, the expression levels of NBS and NAS were almost same, and that of L_307_c0_g1_i1 and L_307_c0_g2_i1 was considerably lower than them. In C. capitulata, the expression levels of C_16324_i1 and C_16324_c0_g1_i2 were very small, and that of C_9931_c0_g1_i1 was very large. These results support TPM values estimated from RNA-seq analysis. In addition, comparing the high-low relationship of the expression level in two species, results obtained by RNA-seq analysis was also supported by qRT-PCR analyses. Curiously, in all three groups (neoculin group, groups 1 and 2) for which there were orthologs in both species, if a gene was highly expressed in one species, its ortholog was weakly expressed in the other species; we did not identify a single case where orthologs were highly expressed in both species. The data shown in Table 3 also support this pattern. These results strongly suggest changes in the gene expression regulatory system due to divergence of the two species.

Next, we aligned the deduced amino acid sequences for the proteins belonging to the neoculin group (Fig. 6a). We divided the sequences into nine regions, including the regions removed by cleavage of the secretion signal peptide and three mannose binding site (MBS)-like regions: N pro-sequence (N-Pro), N-terminal (N-term), MBS1, inter1, MBS2, inter2, MBS3, C-terminal (C-term), and C pro-sequence (C-Pro). The His-11 residue was present in the N-term region of NBS and in the predicted proteins encoded by transcripts L_16562_c0_g1_i1 in C. latifolia and C_16324_c0_g1_i1 in C. capitulata. This site essential for the pH-dependent taste-modifying activity of neoculin. By contrast, transcripts C_9931_c0_g1_i1 in C. capitulata and L_307_c0_g1_i1 and L_307_c0_g2_i1 in C. latifolia (abbreviated ‘C_9931 series’) did not code for His-11, which was replaced by Tyr-11, as in NAS. In addition, Cys-77 and Cys-109, which form an intermolecular disulfide bond between NBS and NAS, were present within the inter2 and C-term regions in both species, but were absent in the C_9931 series. Thus, it is likely that proteins corresponding to the C_9931 series do not form dimers.

Fig. 6
figure6

The essential amino acid residues in neoculin group members have been conserved. a Amino acid sequence alignment of neoculin group members from C. latifolia and C. capitulata fruits. In each alignment, the residues that are shared with only NBS or only NAS are shown in blue and red, respectively. The residues that are not consistent with NBS or NAS are shown in pink, and those that are consistent with only C_9931_c0_g1_i1 (Ser17) are shown in light green. His-11 and Cys residues are highlighted in dark red and dark green, respectively. Arg-48, Tyr-65, Val-72, and Phe-94 are highlighted in pale green. Mannose-binding sites (MBS, QxDxNxVxY) are indicated by a dagger (†), and conserved residues are highlighted in yellow. MBS residues that are conserved in all sequences are indicated by a double dagger (‡). MBS residues in L_307_c0_g2_i1, L_307_c0_g1_i1, and C_9931_c0_g1_i1 are shown in boxes. The predicted proteins were divided into nine regions—N-Pro, N-term, MBS1, inter1, MBS2, inter2, MBS3, C-term, and C-Pro—based on the regions removed after signal-peptide cleavage, the N- or C-terminal regions, the regions of MBS 1 to 3, and the regions between the MBSs. b Amino acid residue substitutions in proteins from the neoculin group. The region from inter2 to C-term is the primary region of sequence diversity in the neoculin group. The values shown in the heatmap are amino acid substitution rates (%) of neoculin group. The NBA sequence was used as the reference

Four residues are responsible for the binding and activation of the human sweet receptor: Arg-48, Tyr-65, Val-72, and Phe-94 [26]. Although Tyr-65 and Val-72 were identified in the C_9931 series, Leu-48 and Val-94 were missing. The lack of His-11 and these four indispensable residues, as well as the lack of dimerization, indicate that the C_9931 series proteins may not possess the sweet taste or taste-modifying properties of classic neoculin. Indeed, a preliminary test indicated that C. capitulata fruits did not have a sweet taste or taste-modifying properties despite the high expression level of C_9931_c0_g1_i1 (data not shown). Three sites similar to the MBS were present in the MBS1, MBS2, and MBS3 regions of this protein. Moreover, whereas NBS and NAS lack the essential residues of the MBS, all of these residues were conserved in C_9931_c0_g1_i1, making C_9931_c0_g1_i1 a likely lectin candidate.

Based on this protein alignment, we investigated all amino acid substitutions in each region in comparison to the two reference sequences, NBS and NAS (Additional file 10). The amino acid substitution rate with reference to NBS is shown in the heatmap in Fig. 6b. Between the NBS series and the NAS series, 18 to 27% of substitutions occurred in the overall regions from the N-term region to C-term region (23%, 26 of 114 residues in NBS). The highest substitution rate was 27% in the MBS2 region, followed by 24% in the inter2 and C-term regions. In the C_9931 series, the highest substitution rate was 53% in the C-term region, followed by the MBS3 region (44%) and inter2 region (43%). These results suggest that the region from inter2 to C-term is the main source of sequence diversity among neoculin group members.

Biochemical analysis

We extracted proteins from C. latifolia and C. capitulata fruits and subjected them to SDS-PAGE, followed by Coomassie brilliant blue (CBB) staining and immunoblotting using a mixture of polyclonal anti-NAS and anti-NBS specific antibodies (Fig. 7 and Additional file 11). The CBB-stained gel is shown in Fig. 7a and the corresponding immunoblot in Fig. 7b. By CBB staining, we detected an 11-kDa band representing NBS and a 13-kDa band representing NAS in C. latifolia fruit samples (Fig. 7a). In C. capitulata fruits, some bands around 11 kDa may be the protein encoded by C_9931_c0_g1_i1, which had a high TPM value. Immunoblotting confirmed the identity of the bands corresponding to NBS and NAS in C. latifolia fruits. However, we detected no such bands in C. capitulata fruits (Fig. 7b), perhaps because NBS and NAS accumulate at very low levels in this species, as reflected by the low TPM values of their encoding transcripts (as described above). The amino acid sequence of the C-term region, which is recognized by the antibody, was also very different in C_9931_c0_g1_i1 compared to both NBS and NAS, which is consistent with the finding that the proteins detected by CBB staining were not detected by immunoblotting.

Fig. 7
figure7

Biochemical analysis of C. latifolia and C. capitulata fruits suggests only C. latifolia possesses neoculin. Extracts from one fruit each of C. latifolia and C. capitulata were subjected to SDS-PAGE. 20 μg protein of each fruit extract was applied to each well. a CBB staining. b Immunoblotting

Discussion

The C. latifolia and C. capitulata transcriptomes contain many neoculin-related genes that are similar within and between species. This diversity is thought to result from gene duplication, which is known to contribute to plant evolution [41,42,43,44,45,46,47]. Such gene duplication might place some genes under the same transcriptional regulation. The neoculin genes NBS and NAS are likely paralogs that arose due to tandem duplication before the divergence of C. latifolia and C. capitulata. The characteristics of NBS and NAS genes in C. latifolia and C. capitulata are summarized in Table 5. Both C. latifolia and C. capitulata produce NBS and NAS transcripts, and the sequences of the C_9931 series transcripts matched those of active GNA family members. However, their expression levels in the two species were very different.

Table 5 Summary of neoculin group transcripts in fruits of two Curculigo species

C. latifolia fruits have been reported to accumulate 1.3 mg neoculin g− 1 fresh pulp. Because neoculin is 550 times as sweet as sucrose [19, 20], one gram of C. latifolia fruit pulp is thus estimated to be equivalent to 715 mg of sucrose in sweetness, explaining the sweet taste of these fruits. Given that the TPM values of the neoculin genes in C. capitulata were only 1/60 those detected in C. latifolia, C. capitulata fruits would be expected to contain only approximately 22 μg neoculin g− 1 fresh pulp and have the same sweetness as 12 mg of sucrose. Based on these values, it seemed likely that C. capitulata fruits would not taste sweet, which we confirmed in a preliminary test. Thus, neoculin levels, and therefore taste, differ greatly between these fruits, paralleling the difference in the expression of NBS and NAS genes in the two species. The taste of C. latifolia fruits may strongly influence its survival strategies. For example, the sweet taste conferred by neoculin may facilitate seed spread by animals.

The structure of the taste-modifying protein miraculin is similar to those of the soybean (Glycine max) Kunitz trypsin inhibitor and thaumatin, a sweet protein with an α-amylase or trypsin-inhibitor-like structure. Similarly, neoculin has a structure similar to that of lectin, a common molecular structure in plants [48,49,50,51,52,53,54]. Trypsin inhibitors, amylase inhibitors, and lectins commonly accumulate in fruits and seeds. The diversity of these proteins arose from gene duplications and mutations during evolution. It appears that over the course of evolution, neoculin, miraculin, and thaumatin all acquired sweetness or taste-modifying activity in regard to human senses.

Lectins are thought to play important protective and storage roles in general plants. Thus, the high expression levels of lectin genes in C. capitulata fruits is likely to reflect important roles of lectins in this plant. In contrast, the low expression levels of neoculin genes in C. capitulata suggest that the encoded protein may be less beneficial in this species. Similarly, and in contrast to C. capitulata, active GNA family members were barely expressed in C. latifolia fruits. Neoculin genes were highly expressed in C. latifolia but weakly expressed in C. capitulata despite the similar vegetative appearance of the two plants (Fig. 1). These physiological differences might be due to mutation(s) of the cis-regulatory elements in these genes. Cis-elements, including promoters, enhancers, and silencers, are very important for the regulation of gene expression [41, 55,56,57,58]. Likewise, the different expression levels of related genes in C. latifolia vs. C. capitulata might be caused by mutations in their cis-elements. For example, the cis-elements of the NBS and NAS genes may have mutated after the divergence of the two species, or the genes may have acquired mutations or lost cis-elements during the gene duplication events that led to their divergence, leading to different expression patterns. Deciphering the genomic information of these two species further might help verify this notion and distinguish among these possible mechanisms.

Conclusions

RNA-seq analysis and de novo transcriptome assembly of C. latifolia and C. capitulata fruits revealed the presence of numerous neoculin-like genes. Among the various neoculin-related genes that arose from gene duplication, several mutations accumulated, resulting in the genes encoding NBS and NAS. These proteins form the heterodimeric protein neoculin, which exhibits taste-modifying activity in humans. Our comprehensive investigation of the genes expressed in the fruits of these two Curculigo species will help uncover the origin of neoculin at the molecular level.

Methods

Plant materials

C. latifolia (voucher ID 26092) was obtained from the Research Center for Medicinal Plant Resources, National Institutes of Biomedical Innovation, Health, and Nutrition, Tsukuba, Japan (originated in Indonesia). C. capitulata (voucher ID 31481) was obtained from The Naito Museum of Pharmaceutical Science and Industry, Kakamigahara, Japan. The plants were cultivated in a greenhouse at the Yamashina Botanical Research Institute. Photographs of the fruits of these plants are shown in Fig. 1.

Fruit setting

C. latifolia flowers were pollinated by hand in the morning on the first day of flowering. C. capitulata flowers were placed in 50 ppm of 1-naphthylacetic acid (NAA) in the morning of both the first and second days of flowering. This is the first report of a method to induce C. capitulata fruit set through plant hormone application. About 60 days after flowering, mature fruits were harvested and immediately soaked in RNA later™ solution (Thermo Fisher Scientific, MA. USA). The fruits were stored at − 80 °C until use. The samples were ground into a powder in liquid nitrogen prior to RNA extraction. Total RNA was extracted from the frozen samples using the phenol-SDS method, and poly(A)+ mRNA was purified using an mRNA Purification Kit (Amersham Biosciences, Buckinghamshire, UK).

Sequencing

mRNA sequencing was performed by Hokkaido System Science Co., Ltd. (Hokkaido, Japan). A cDNA library was generated using TruSeq RNA Sample Prep Kit v2 (Illumina, Inc., CA. USA) and sequenced on an Illumina HiSeq 2500 platform (101 bp read length, paired-end, unstranded). The raw reads were cleaned using cutadapt1.1 [59] and trimmomatic0.32 [60]. We removed adapter sequences, low-quality sequences (reads with ambiguous ‘N’ bases), and reads with Q-value < 20 bases. Sequences smaller than 50 bases were eliminated. The remaining high-quality reads were assembled into contigs using Trinity2.11 [61] with default options. We quantified transcript levels as transcripts per million (TPM) values using Bowtie1.12 [62] and RSEM (RNA-Seq by Expectation-Maximization) [63] in the Trinity package.

Sequence clustering

The assembled sequences were compared against the NCBI NR, prot-plant from RefSeq, UniProt, the rice genome (Os-Nipponbare-Reference-IRGSP-1.0, Assembly: GCF_001433935.1), and the Arabidopsis genome (Arabidopsis thaliana, Assembly: GCF_000001735.4) with an E-value <1e− 10. BLAST analysis was performed using BLAST version 2.2.31. CD-Hit (cd-hit-est) [64, 65] was used for clustering with the option of threshold (−c) 0.9 to obtain unigenes.

Comparison of gene expression in C. latifolia vs. C. capitulata fruits

To compare the transcripts in C. latifolia vs. C. capitulata fruits, a BLASTN search was performed with E-value <1e− 5 using each transcript from one species as the query against all transcripts from the other species, and then the best hits were selected. cDNA was synthesized from 1 μg of total RNA using SuperScript IV Reverse Transcriptase (Thermo Fisher Scientific, MA. USA) according to the manufacturer’s instructions. PowerUp SYBR Green Master Mix (Thermo Fisher Scientific, MA. USA) was used with an ABI 7500 real-time PCR system (Thermo Fisher Scientific, MA. USA). The thermal cycling program was performed using the following parameters: denaturation at 95 °C for 2 min, prior to 40 amplification cycles (95 °C for 15 s, 60 °C for 1 min). Melting curves were constructed after 40 cycles to confirm the specificity of the reactions. The 2-ΔΔCT method was used to calculate the relative expression of six genes following normalization to L_19431_c0_g1_i2 for C. latifolia and C_20039_c0_g6_i1 for C. capitulata, which are probably ubiquitin genes in C. latifolia and C. capitulata. The primer sequences are shown in Additional file 8.

Identification of lectin gene transcripts in C. latifolia and C. capitulata fruits

A tBLASTN search (E-value <1e− 4; other options set to the default) was performed against all transcripts in C. latifolia and C. capitulata fruits with the following protein sequences as the queries, which represent each plant lectin family [41]: Agaricus bisporus (white mushroom) agglutinin (UniProtKB/Swiss-Prot: Q00022.3—ABA), Amaranthus caudatus (foxtail amaranth) agglutinin (GenBank: AAL05954.1—amaranthin), Robinia pseudoacacia (black locust) chitinase-related agglutinin (GenBank: ABL98074.1—CRA), Nostoc ellipsosporum (cyanobacterium) agglutinin (UniProtKB/Swiss-Prot: P81180.2—cyanovirin), Euonymus europaeus (European spindle) agglutinin (GenBank: ABW73993.1—EUL), Galanthus nivalis (snowdrop) agglutinin (UniProtKB/Swiss-Prot: P30617.1—GNA), Hevea brasiliensis (rubber tree) agglutinin (GenBank: ABW34946.1—hevein), Artocarpus integer (chempedak) agglutinin (GenBank: AAA32680.1—JRL), Glycine max (soybean) agglutinin (UniProtKB/Swiss-Prot: P05046.1—legume lectin), Brassica juncea (brown mustard) LysM domain (GenBank: BAN83772.1—LysM), Nicotiana tabacum (tobacco) agglutinin (GenBank: AAK84134.1—Nictaba), and the lectin chain of Ricinus communis (castor bean) agglutinin (GenBank: PDB: 2AAI_B—ricin B). The top hits were selected.

Phylogenetic analysis of the GNA protein family

The sequences of 17 well-known GNA proteins were selected according to Shimizu-Ibuka et al. [36]. The protein sequences for ASA, Allium sativum (garlic) (1BWU); GNA, Galanthus nivalis (snowdrop) (1MSA); and NPL, and Narcissus pseudonarcissus (wild daffodil) (1NPL) were obtained from the Protein Data Bank. Others sequences were selected from GenBank as follows: PRA, Polygonatum roseum (AY899824); PMA, Polygonatum multiflorum (Solomon’s seal) (U44775); CMA, Clivia miniata (kaffir lily) (L16512); ZCA, Zephyranthes candida (autumn zephyr lily) (AF527385); AAA, Allium ascalonicum (shallot) (L12172); ACA, Allium cepa (onion) (AY376826); AUA, Allium ursinum (wild garlic) (U68531); THC, Tulipa hybrid cultivar (tulip) (U23043); ZOA, Zingiber officinale (ginger) (AY657021); ACO, Ananas comosus (pineapple) (AY098512); AKA, Amorphophallus konjac (konjac) (AY191004); DPA, Dioscorea polystachya (yam tuber) (AB178475); CHC, Cymbidium hybrid cultivar (cymbidium) (U02516); and EHA, Epipactis helleborine (broad-leaved helleborine) (U02515). These 17 sequences and 25 neoculin-related proteins predicted from full-length transcripts (10 transcripts from C. latifolia and 15 from C. capitulata; Fig. 5) were aligned using ClustalX [66], and the neighbor-joining tree was generated and analyzed with 1000 replicates for bootstrap testing. A complete list of sequences used is given in additional file 12.

Biochemical analysis

SDS-PAGE was carried out using fruit extracts from C. latifolia and C. capitulata. The proteins were visualized by Coomassie brilliant blue (CBB) staining. Immunoblot analysis was carried out using anti-NBS and anti-NAS specific polyclonal antibodies [38, 67], which were raised against the C terminus of NAS or NBS, respectively. Preparation and purification of fruit extracts were performed as described previously [18, 38]. Each 0.1 g pulp sample was treated with 0.5 mL of 0.5 M NaCl to obtain an extract, which was combined with the appropriate volume of buffer containing 2-mercaptoethanol for SDS-PAGE. After the SDS-PAGE, proteins were transferred to PVDF membrane pore size of 0.45 μm (Merck Millipore, MA. USA). The membrane was soaked in Tris-buffered saline/Tween-20 (TBST) containing 5% skim-milk to block the non-specific protein reaction. After blocking, the membrane was reacted with the mixture of anti-NBS and anti-NAS specific polyclonal antibodies diluted 1:500 in TBST solution for 1 h at room temperature. And then, the membrane was washed with TBST solution at three times for 5 min. Next, the membrane was reacted with Rabbit IgG HRP Linked Whole Ab (Sigma-Aldrich, MO. USA) diluted 1:4000 in TBST solution for 1 h at room temperature. The membrane was washed with TBST solution at three times for 5 min. Signals were visualized with Clarity Western ECL Substrate kit (BIO-RAD, CA. USA) according to the protocol attached to ECL Kit. The signals were detected at 428 nm for 20 s exposure using Luminescent Image Analyzer (Image Quant LAS 4000 mini, GE Healthcare, IL. USA).

Availability of data and materials

The raw data and processed data from this study have been uploaded to the NCBI Gene Expression Omnibus (GSE151377) and are available in the NCBI database under accession number PRJNA635640, https://www.ncbi.nlm.nih.gov/bioproject/635640. NCBI NR (https://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz), prot-plant from RefSeq (https://ftp.ncbi.nlm.nih.gov/refseq/release/plant/), and UniProt (https://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/) databases were used in this study. Accession numbers of sequences are given in Additional file 12.

Abbreviations

AAA:

Allium ascalonicum agglutinin

ACA:

Allium cepa agglutinin

ACO:

Ananas comosus lectin

AKA:

Amorphophallus konjac agglutinin

ASA:

Allium sativum agglutinin

AUA:

Allium ursinum agglutinin

CHC:

Cymbidium hybrid cultivar agglutinin

CMA:

Clivia miniata agglutinin

COG:

Cluster of Orthologous Groups

DPA:

Dioscorea polystachya agglutinin

ECL:

enhanced chemiluminescence

EHA:

Epipactis helleborine agglutinin

GNA:

Galanthus nivalis agglutinin

GO:

Gene Ontology

HRP:

Horseradish peroxidase

NAA:

1-naphthylacetic acid

NAS:

Neoculin acidic subunit

NCBI:

The National Center for Biotechnology Information

NBS:

Neoculin basic subunit

NGS:

Next generation sequencing

NPL:

Narcissus pseudonarcissus lectin

PMA:

Polygonatum multiflorum agglutinin

PRA:

Polygonatum roseum agglutinin

PVDF:

Polyvinylidene difluoride

TBST:

Tris-buffered saline/Tween-20

THC:

Tulipa hybrid cultivar lectin

ZOA:

Zingiber officinale agglutinin

TPM:

Transcripts per million

ZCA:

Zephyranthes candida agglutinin

References

  1. 1.

    Burkill IH. A dictionary of the economic products of the Malay peninsula. London: Crown Agents for the Colonies; 1966. p. 713–4.

    Google Scholar 

  2. 2.

    Perry LM. Medicinal plants of east and Southeast Asia. Cambridge: MIT Press; 1895. p. 12.

    Google Scholar 

  3. 3.

    Plants of the World Online. http://www.plantsoftheworldonline.org/. Accessed 17 May 2020.

  4. 4.

    Kocyan A. The discovery of polyandry in Curculigo (Hypoxidaceae): implications for androecium evolution of asparagoid monocotyledons. Ann Bot. 2007;100(2):241–8. https://doi.org/10.1093/aob/mcm091.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Kocyan A, Snijman DA, Forest F, Devey DS, Freudenstein JV, Wiland-Szymańska J, et al. Molecular phylogenetics of Hypoxidaceae-evidence from plastid DNA data and inferences on morphology and biogeography. Mol Phylogenet Evol. 2011;60(1):122–36. https://doi.org/10.1016/j.ympev.2011.02.021.

    Article  PubMed  Google Scholar 

  6. 6.

    Liu KW, Xie GC, Chen LJ, Xiao XJ, Zheng YY, Cai J, et al. Sinocurculigo, a new genus of Hypoxidaceae from China based on molecular and morphological evidence. PLoS One. 2012;7(6):e38880. https://doi.org/10.1371/journal.pone.0038880.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Ranjbarfard A, Saleh G, Abdullah NAP, Kashiani P. Genetic diversity of lemba (Curculigo latifolia) populations in peninsular Malaysia using ISSR molecular markers. Aust J Crop Sci. 2014;8(1):9–17.

    Google Scholar 

  8. 8.

    Eksomtramage L, Kwandarm M, Purintavaragul C. Karyotype of some Thai Hypoxidaceae species. Songklanakarin J Sci Technol. 2013;35(4):379–82.

    Google Scholar 

  9. 9.

    Okubo S, Yamada M, Yamaura T, Akita T. Effects of the pistil size and self-incompatibility on fruit production in Curculigo latifolia (Liliaceae). J Jpn Soc Hort Sci. 2010;79(4):354–9. https://doi.org/10.2503/jjshs1.79.354.

    Article  Google Scholar 

  10. 10.

    Asif M. A review on phytochemical and ethnopharmacological activities of Curculigo orchioides. Mahidol Univ J Pharm Sci. 2012;39(3–4):1–10.

    CAS  Google Scholar 

  11. 11.

    Babaei N, Abdullah NAP, Saleh G, Abdullah TL. An efficient in vitro plantlet regeneration from shoot tip cultures of Curculigo latifolia, a medicinal plant. ScientificWorldJournal. 2014;2014:275028.

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Ishak NA, Ismail M, Hamid M, Ahmad Z, Abd Ghafar SA. Antidiabetic and hypolipidemic activities of Curculigo latifolia fruit: root extract in high fat fed diet and low dose STZ induced diabetic rats. Evid Based Complement Alternat Med. 2013;2013:601838.

    Article  Google Scholar 

  13. 13.

    Li S, Yu JH, Fan YY, Liu QF, Li ZC, Xie ZX, et al. Structural elucidation and total synthesis of three 9-torlignans from Curculigo capitulata. J Org Chem. 2019;84(9):5195–202. https://doi.org/10.1021/acs.joc.9b00170.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Nie Y, Dong X, He Y, Yuan T, Han T, Rahman K, et al. Medicinal plants of genus Curculigo: traditional uses and a phytochemical and ethnopharmacological review. J Ethnopharmacol. 2013;147(3):547–63. https://doi.org/10.1016/j.jep.2013.03.066.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Wang KJ, Zhu CC, Di L, Li N, Zhao YX. New norlignan derivatives from Curculigo capitulata. Fitoterapia. 2010;81(7):869–72. https://doi.org/10.1016/j.fitote.2010.05.012.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Yamashita H, Theerasilp S, Aiuchi T, Nakaya K, Nakamura Y, Kurihara Y. Purification and complete amino acid sequence of a new type of sweet protein taste-modifying activity, curculin. J Biol Chem. 1990;265(26):15770–5. https://doi.org/10.1016/S0021-9258(18)55464-8.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Nakajima K, Asakura T, Oike H, Morita Y, Shimizu-Ibuka A, Misaka T, et al. Neoculin, a taste-modifying protein, is recognized by human sweet taste receptor. Neuroreport. 2006;17(12):1241–4. https://doi.org/10.1097/01.wnr.0000230513.01339.3b.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Shirasuka Y, Nakajima K, Asakura T, Yamashita H, Yamamoto A, Hata S, et al. Neoculin as a new taste-modifying protein occurring in the fruit of Curculigo latifolia. Biosci Biotechnol Biochem. 2004;68(6):1403–7. https://doi.org/10.1271/bbb.68.1403.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Kant R. Sweet proteins--potential replacement for artificial low calorie sweeteners. Nutr J. 2005;4(1):5. https://doi.org/10.1186/1475-2891-4-5.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Yamashita H, Akabane T, Kurihara Y. Activity and stability of a new sweet protein with taste-modifying action, curculin. Chem Senses. 1995;20(2):239–43. https://doi.org/10.1093/chemse/20.2.239.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Nakajima K, Koizumi A, Iizuka K, Ito K, Morita Y, Koizumi T, et al. Non-acidic compounds induce the intense sweet taste of neoculin, a taste-modifying protein. Biosci Biotechnol Biochem. 2011;75(8):1600–2. https://doi.org/10.1271/bbb.110081.

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Koizumi A, Nakajima K, Asakura T, Morita Y, Ito K, Shmizu-Ibuka A, et al. Taste-modifying sweet protein, neoculin, is received at human T1R3 amino terminal domain. Biochem Biophys Res Commun. 2007;358(2):585–9. https://doi.org/10.1016/j.bbrc.2007.04.171.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Suzuki M, Kurimoto E, Nirasawa S, Masuda Y, Hori K, Kurihara Y, et al. Recombinant curculin heterodimer exhibits taste-modifying and sweet-tasting activities. FEBS Lett. 2004;573(1–3):135–8. https://doi.org/10.1016/j.febslet.2004.07.073.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Abe K, Yamashita H, Arai S, Kurihara Y. Molecular cloning of curculin, a novel taste-modifying protein with a sweet taste. Biochim Biophys Acta. 1992;1130(2):232–4. https://doi.org/10.1016/0167-4781(92)90537-A.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Nakajima K, Yokoyama K, Koizumi T, Koizumi A, Asakura T, Terada T, et al. Identification and modulation of the key amino acid residue responsible for the pH sensitivity of neoculin, a taste-modifying protein. PLoS One. 2011;6(4):e19448. https://doi.org/10.1371/journal.pone.0019448.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Koizumi T, Terada T, Nakajima K, Kojima M, Koshiba S, Matsumura Y, et al. Identification of key neoculin residues responsible for the binding and activation of the sweet taste receptor. Sci Rep. 2015;5(1):12947. https://doi.org/10.1038/srep12947.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Morita Y, Nakajima K, Iizuka K, Terada T, Shimizu-Ibuka A, Ito K, et al. pH-dependent structural change in neoculin with special reference to its taste-modifying activity. Biosci Biotechnol Biochem. 2009;73(11):2552–5. https://doi.org/10.1271/bbb.90524.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Ohkubo T, Tamiya M, Abe K, Ishiguro M. Structural basis of pH dependence of neoculin, a sweet taste-modifying protein. PLoS One. 2015;10(5):e0126921. https://doi.org/10.1371/journal.pone.0126921.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Van Damme EJM, Peumans WJ, Barre A, Rougé P. Plant lectins: a composite of several distinct families of structurally and evolutionary related proteins with diverse biological roles. Crit Rev Plant Sci. 1998;17(6):575–692. https://doi.org/10.1080/07352689891304276.

    Article  Google Scholar 

  30. 30.

    Van Damme EJM, Lannoo N, Peumans WJ. Plant lectins. Adv Bot Res. 2008:107–209. https://doi.org/10.1016/S0065-2296(08)00403-5 Elsevie.

  31. 31.

    Barre A, Van Damme EJM, Peumans WJ, Rougé P. Structure-function relationship of monocot mannose-binding lectins. Plant Physiol. 1996;112(4):1531–40. https://doi.org/10.1104/pp.112.4.1531.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Shimizu-Ibuka A, Morita Y, Terada T, Asakura T, Nakajima K, Iwata S, et al. Crystal structure of neoculin: insights into its sweetness and taste-modifying activity. J Mol Biol. 2006;359(1):148–58. https://doi.org/10.1016/j.jmb.2006.03.030.

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Kurimoto E, Suzuki M, Amemiya E, Yamaguchi Y, Nirasawa S, Shimba N, et al. Curculin exhibits sweet-tasting and taste-modifying activities through its distinct molecular surfaces. J Biol Chem. 2007;282(46):33252–6. https://doi.org/10.1074/jbc.C700174200.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Barre A, Van Damme EJM, Peumans WJ, Rougé P. Curculin, a sweet-tasting and taste-modifying protein, is a non-functional mannose-binding lectin. Plant Mol Biol. 1997;33(4):691–8. https://doi.org/10.1023/A:1005704616565.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Harada S, Otani H, Maeda S, Kai Y, Kasai N, Kurihara Y. Crystallization and preliminary X-ray diffraction studies of curculin. A new type of sweet protein having taste-modifying action. J Mol Biol. 1994;238(2):286–7. https://doi.org/10.1006/jmbi.1994.1289.

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Shimizu-Ibuka A, Nakai Y, Nakamori K, Morita Y, Nakajima K, Kadota K, et al. Biochemical and genomic analysis of neoculin compared to monocot mannose-binding lectins. J Agric Food Chem. 2008;56(13):5338–44. https://doi.org/10.1021/jf800214b.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Nakajo S, Akabane T, Nakaya K, Nakamura Y, Kurihara Y. An enzyme immunoassay and immunoblot analysis for curculin, a new type of taste-modifying protein: cross-reactivity of curculin and miraculin to both antibodies. Biochim Biophys Acta. 1992;1118(3):293–7. https://doi.org/10.1016/0167-4838(92)90287-N.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Okubo S, Asakura T, Okubo K, Abe K, Misaka T, Akita T. Neoculin, a taste-modifying sweet protein, accumulates in ripening fruits of cultivated Curculigo latifolia. J Plant Physiol. 2008;165(18):1964–9. https://doi.org/10.1016/j.jplph.2008.04.019.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Standard tables of food composition in Japan - 2015 - (Seventh revised version) 2015. MEXT. https://www.mext.go.jp/en/policy/science_technology/policy/title01/detail01/1374030.htm. Accessed 17 May 2020.

  40. 40.

    Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35. https://doi.org/10.1093/nar/gkn176.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. 41.

    De Schutter K, Tsaneva M, Kulkarni SR, Rougé P, Vandepoele K, Van Damme EJM. Evolutionary relationships and expression analysis of EUL domain proteins in rice (Oryza sativa). Rice (N Y). 2017;10(1):26.

    Article  Google Scholar 

  42. 42.

    Cannon SB, Mitra A, Baumgarten A, Young ND, May G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004;4(1):10. https://doi.org/10.1186/1471-2229-4-10.

    Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Copley SD. Evolution of new enzymes by gene duplication and divergence. FEBS J. 2020;287(7):1262–83. https://doi.org/10.1111/febs.15299.

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Dang L, Van Damme EJM. Genome-wide identification and domain organization of lectin domains in cucumber. Plant Physiol Biochem. 2016;108:165–76. https://doi.org/10.1016/j.plaphy.2016.07.009.

    CAS  Article  PubMed  Google Scholar 

  45. 45.

    Fukushima K, Fang X, Alvarez-Ponce D, Cai H, Carretero-Paulet L, Chen C, et al. Genome of the pitcher plant Cephalotus reveals genetic changes associated with carnivory. Nat Ecol Evol. 2017;1(3):59. https://doi.org/10.1038/s41559-016-0059.

    Article  PubMed  Google Scholar 

  46. 46.

    Panchy N, Lehti-Shiu M, Shiu SH. Evolution of gene duplication in plants. Plant Physiol. 2016;171(4):2294–316. https://doi.org/10.1104/pp.16.00523.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Yan J, Li G, Guo X, Li Y, Cao X. Genome-wide classification, evolutionary analysis and gene expression patterns of the kinome in Gossypium. PLoS One. 2018;13(5):e0197392. https://doi.org/10.1371/journal.pone.0197392.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. 48.

    de Vos AM, Hatada M, van der Wel H, Krabbendam H, Peerdeman AF, Kim SH. Three-dimensional structure of thaumatin I, an intensely sweet protein. Proc Natl Acad Sci U S A. 1985;82(5):1406–9. https://doi.org/10.1073/pnas.82.5.1406.

    Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Kurihara Y. Characteristics of antisweet substances, sweet proteins, and sweetness-inducing proteins. Crit Rev Food Sci Nutr. 1992;32(3):231–52. https://doi.org/10.1080/10408399209527598.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Liu JJ, Sturrock R, Ekramoddoullah AK. The superfamily of thaumatin-like proteins: its origin, evolution, and expression towards biological function. Plant Cell Rep. 2010;29(5):419–36. https://doi.org/10.1007/s00299-010-0826-8.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Petre B, Major I, Rouhier N, Duplessis S. Genome-wide analysis of eukaryote thaumatin-like proteins (TLPs) with an emphasis on poplar. BMC Plant Biol. 2011;11(1):33. https://doi.org/10.1186/1471-2229-11-33.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Selvakumar P, Gahloth D, Tomar PP, Sharma N, Sharma AK. Molecular evolution of miraculin-like proteins in soybean Kunitz super-family. J Mol Evol. 2011;73(5–6):369–79. https://doi.org/10.1007/s00239-012-9484-5.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    Theerasilp S, Hitotsuya H, Nakajo S, Nakaya K, Nakamura Y, Kurihara Y. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin. J Biol Chem. 1989;264(12):6655–9. https://doi.org/10.1016/S0021-9258(18)83477-9.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    Witty M, Higginboyham JD. Thaumatin. Florida: CRC Press, Inc.; 1994. p. 20–35.

    Google Scholar 

  55. 55.

    Jiang SY, Ma Z, Ramachandran S. Evolutionary history and stress regulation of the lectin superfamily in higher plants. BMC Evol Biol. 2010;10(1):79. https://doi.org/10.1186/1471-2148-10-79.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Lambin J, Asci SD, Dubiel M, Tsaneva M, Verbeke I, Wytynck P, et al. OsEUL lectin gene expression in rice: stress regulation, subcellular localization and tissue specificity. Front Plant Sci. 2020;11:185. https://doi.org/10.3389/fpls.2020.00185.

    Article  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Li XQ. Developmental and environmental variation in genomes. Heredity (Edinb). 2009;102(4):323–9. https://doi.org/10.1038/hdy.2008.132.

    CAS  Article  Google Scholar 

  58. 58.

    Wittkopp PJ, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 2011;13(1):59–69. https://doi.org/10.1038/nrg3095.

    CAS  Article  PubMed  Google Scholar 

  59. 59.

    Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17(1):10–2.

    Article  Google Scholar 

  60. 60.

    Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. https://doi.org/10.1093/bioinformatics/btu170.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nat Biotechnol. 2013;29(7):644–52.

    Article  Google Scholar 

  62. 62.

    Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. https://doi.org/10.1186/gb-2009-10-3-r25.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics. 2011;12(1):323. https://doi.org/10.1186/1471-2105-12-323.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2. https://doi.org/10.1093/bioinformatics/bts565.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. https://doi.org/10.1093/bioinformatics/btl158.

    CAS  Article  Google Scholar 

  66. 66.

    Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947–8. https://doi.org/10.1093/bioinformatics/btm404.

    CAS  Article  PubMed  Google Scholar 

  67. 67.

    Nakajima K, Asakura T, Maruyama J, Morita Y, Oike H, Shimizu-Ibuka A, et al. Extracellular production of neoculin, a sweet-tasting heterodimeric protein with taste-modifying activity, by Aspergillus oryzae. Appl Environ Microbiol. 2006;72(5):3716–23. https://doi.org/10.1128/AEM.72.5.3716-3723.2006.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to the late Dr. Kosaburo Nishi of Research Center for Medicinal Plant Resources (Tsukuba Division), National Institutes of Biomedical Innovation, Health and Nutrition, for kindly providing Curculigo latifolia plants. We also thank Hiroshi Morita, the Director of The Naito Museum of Pharmaceutical Science and Industry, for kindly providing Curculigo capitulata plants. Computations were partially performed on the NIG supercomputer.

Funding

This study was supported by the Cross-ministerial Strategic Innovation Promotion Program (Grant No. 14532924; K.A.), a Grant-in-Aid for Scientific Research B (Grant No. 19300248; T.A.) from the Society for the Promotion of Science in Japan and Adaptable and Seamless Technology transfer Program through Target-driven R&D (A-STEP) from Japan Science and Technology Agency to T.A. (Grant No. JPMJTR194F).

Author information

Affiliations

Authors

Contributions

TA conceived the study and participated in the design of all experiments. SO1, KT, SO2, TY, TM, KN and KA analyzed and interpreted data. SO1 and TY cultivated plants and performed sample preparation. SO1, SO2 and YS performed biological experiments. SO1 and KT wrote the manuscript. KA discussed the experiments and manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tomiko Asakura.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplemental Figure 1

. Length distribution of the assembled transcripts (> 1 TPM) from C. latifolia (purple) and C. capitulata (orange) fruits

Additional file 2: Supplemental Figure 2

. Distribution of transcripts per million (TPM) values of the assembled transcripts from C. latifolia (purple) and C. capitulata (orange) fruits. Average, median and mode of C. latifolia were 11.7, 1.6, and 2, respectively, and those of C. capitulata, 13.0, 1.9, and 2, respectively

Additional file 3: Supplemental annotation

. Transcripts from C. latifolia and C. capitulata fruits were annotated by BLASTX (E-value <1e− 10) against the NCBI NR, RefSeq, UniProt, and COG databases and genomes from rice (Os-Nipponbare-Reference-IRGSP-1.0, Assembly: GCF_001433935.1) and Arabidopsis (Arabidopsis thaliana, Assembly: GCF_000001735.4). Percentage identity (Pident) and E-value are BLASTN results performed using C. latifolia as the query against C. capitulata. Expression values (TPM), unigenes clustered by CD-Hit, and the unique or common genes from C. capitulata or C. latifolia are also included

Additional file 4: Supplemental Figure 3

. Gene Ontology (GO) annotation of transcripts from C. latifolia (purple) and C. capitulata (orange) fruits. In total, 28,100 (C. latifolia) and 29,614 (C. capitulata) transcripts were classified based on GO terms. No significant differences were observed between the two species

Additional file 5: Supplemental Table 1

. Selection of sequences for phylogenetic analysis using BLAST search of transcripts from C. latifolia and C. capitulata fruits. The query sequences were the amino acid sequences (AA) of GNA (UniProtKB/Swiss-Prot: P30617.1) and the nucleotide sequences (nucl) and AA number of NBS (GenBank: X64110.1, GenBank: CAA45476.1) and NAS (GenBank: AB167079.1, GenBank: BAD29946.1). Each top hit was selected. The contigs that were selected for each query and used as neoculin-related sequences in phylogenetic analysis (Fig. 5) are indicated by checkmarks ()

Additional file 6: Supplemental Figure 4

. Amino acid sequence alignment of 10 C. latifolia transcripts, 15 C. capitulata transcripts, and 17 well-known GNA family members used in the phylogenetic analysis shown in Fig. 5

Additional file 7: Supplemental Figure 5

. Comparison of the protein sequences of neoculin (NBS and NAS) from public databases and the Curculigo latifolia proteins identified in the present study. Amino acid residues that differ between NBS and NAS are shown in blue for NBS residues and red for NAS residues. Asn-2 of L_16562_c0_g1_i2 was the only residue that differed with NAS, which has Ser at this position (GenBank: BAD29946.1). These de novo assemblies were good matches with the known sequences (NBS and NAS). This level of matching supports the accuracy of the assembly

Additional file 8: Supplemental Table 2

. Primer information used for qRT-PCR. qRT-PCR analyses were performed on the neoculin related genes from C. latifolia and C. capitulata fruits

Additional file 9: Supplemental Table 3

. Relative quantification of the neoculin related genes from C. latifolia and C. capitulata fruits by qRT-PCR. The mean values of qRT-PCR from three independent biological replicates were normalized to ubiquitin gene of each species

Additional file 10: Supplemental Table 4

. Numbers of amino acid substitutions in neoculin group proteins. The values show the number of substituted amino-acids in each region. The predicted proteins were divided into nine regions—N-Pro, N-term, MBS1, inter1, MBS2, inter2, MBS3, C-term and C-Pro—based on the regions removed by processing, the N- or C-terminal regions, the mannose-binding sites MBS 1 to 3, and the regions between the MBSs. “A” indicates residues in NAS that are different from those of NBS. “B” indicates residues in NBS that are different from those of NAS. “C” indicates residues different from both NBS and NAS. “D” indicates the residues only present in C_9931_c0_g1_i1

Additional file 11: Supplemental Figure 6

. Original pictures of Fig. 7. (a) CBB staining gel. (b) PVDF membrane after reaction under bright field. (c) Immunoblotting membrane reacted with ECL. The signals were detected at 428 nm with the exposure time of 20 s. (d) Overlay image of (b) and (c)

Additional file 12: Supplemental Table 5

. Accession number of sequences obtained from web-based sources

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Okubo, S., Terauchi, K., Okada, S. et al. De novo transcriptome analysis and comparative expression profiling of genes associated with the taste-modifying protein neoculin in Curculigo latifolia and Curculigo capitulata fruits. BMC Genomics 22, 347 (2021). https://doi.org/10.1186/s12864-021-07674-3

Download citation

Keywords

  • NGS
  • RNA-seq
  • Neoculin
  • NBS
  • NAS
  • Curculigo capitulata
  • Curculigo latifolia
  • Expression profile
  • Gene duplication