Skip to main content

Genomic insights into lineage-specific evolution of the oleosin family in Euphorbiaceae

Abstract

Background

Lipid droplets (LDs) present in land plants serve as an essential energy and carbon reserve for seed germination and seedling development. Oleosins, the most abundant structural proteins of LDs, comprise a small family involved in LD formation, stabilization and degradation. Despite their importance, our knowledge on oleosins is still poor in Euphorbiaceae, a large plant family that contains several important oil-bearing species.

Results

To uncover lineage-specific evolution of oleosin genes in Euphorbiaceae, in this study, we performed a genome-wide identification and comprehensive comparison of the oleosin family in Euphorbiaceae species with available genome sequences, i.e. castor bean (Ricinus communis), physic nut (Jatropha curcas), tung tree (Vernicia fordii), Mercurialis annua, cassava (Manihot esculenta) and rubber tree (Hevea brasiliensis), and a number of five, five, five, five, eight and eight members were found, respectively. Synteny analysis revealed one-to-one collinear relationship of oleosin genes between the former four (i.e. castor bean, physic nut, tung tree and M. annua) as well as latter two species (i.e. cassava and rubber tree), whereas one-to-one and one-to-two collinear relationships were observed between physic nut and cassava, reflecting the occurrence of one recent whole-genome duplication (WGD) in the last common ancestor of cassava and rubber tree. The presence of five ortholog groups representing three previously defined clades (i.e. U, SL and SH) dates back at least to the Malpighiales ancestor, because they are also conserved in poplar (Populus trichocarpa), a tree having experienced one Salicaceae-specific recent WGD. As observed in poplar, WGD was shown to be the main driver for the family expansion in both cassava and rubber tree. Nevertheless, same retention patterns of WGD-derived duplicates observed in cassava and rubber tree are somewhat different from that of poplar, though certain homologous fragments are still present in rubber tree. Further transcriptional profiling revealed an apparent seed-predominant expression pattern of oleosin genes in physic nut, castor bean and rubber tree. Moreover, structure and expression divergence of paralogous pairs were also observed in both cassava and rubber tree.

Conclusion

Comparative genomics analysis of oleosin genes reported in this study improved our knowledge on lineage-specific family evolution in Euphorbiaceae, which also provides valuable information for further functional analysis and utilization of key members and their promoters.

Peer Review reports

Background

Euphorbiaceae (spurge), which belongs to the order Malpighiales, is a very large family composed of more than 7000 species in around 300 genera. They appear as herbs, shrubs, and trees that are widely distributed in tropical, subtropical, and temperate regions [1]. The economic importance has prompted active attempts on genome characterization of several Euphorbiaceae species, i.e., castor bean (Ricinus communis), physic nut (Jatropha curcas), rubber tree (Hevea brasiliensis), cassava (Manihot esculenta), tung tree (Vernicia fordii), and Mercurialis annua [2,3,4,5,6,7,8,9,10]. Among them, M. annua, a wind-pollinated annual herb originated in Europe, North Africa, and Middle East, represents an ideal model plant for studying sexual systems [10]. Castor bean, physic nut, and tung tree, which are native to Africa, Central America, and China, respectively, are three important non-food oilseed shrubs or small trees accumulating a high level of oil (>40%) in their seeds. The physic nut oil with fossil fuel-like fatty acid composition is a potential material for biodiesel production; the castor oil dominant in ricinoleic acid is widely used for industrial, medicinal, and cosmetic purposes; and, the tung oil rich in α-eleostearic acid (α-ESA) is widely used in the production of inks, dyes, resins, and biodiesel [2, 5, 9]. Cassava and rubber tree, both of which originated in the Southern Amazon basin, also accumulate more than 25% of oil in their seeds, though they have not been well explored [11]. Instead, the starchy-enriched storage roots of cassava are not only staple food for millions of people but also ideal for bio-ethanol production, whereas natural rubber or cis-1,4-polyisoprene, which is specifically produced by the rubber tree laticifer, is an indispensable industrial raw material for various uses [6, 12]. Despite the diversity in morphology and traits of cassava and rubber tree, they were proven to share one so-called ρ whole-genome duplication (WGD) event after the split with other Euphorbiaceae plants, occurred within a window of 39–47 million years ago (Mya) [6, 13,14,15]. In evolutionary terms, it is of particular interest to study species-specific evolution of genes associated with certain economic traits in Euphorbiaceae.

In plants, lipids in the form of triacylglycerols (TAGs) are the most abundant energy-dense storage compounds in seeds as well as several vegetative tissues [16]. TAGs are stored within lipid droplets (LDs) or oil bodies (OBs) that are characterized by a layer of phospholipids and several types of structural proteins such as oleosins, caleosins, and steroleosins [17]. Oleosins, the small (14–30 kDa) but most abundant LD proteins, feature a conserved central hydrophobic portion that is known as the proline knot motif (−PX5SPX3P-) of approximately 72 residues, whereas N- and C-terminal peptides are amphipathic and usually variable [18, 19]. Oleosin genes are widely distributed from single-celled algae to land plants. In contrast to a single or few members found in green algae, the oleosin family is highly abundant and diverse in land plants [19,20,21]. For example, there are six, six, 13, 17 or 48 members present in safflower (Carthamus tinctorius), rice (Oryza sativa), flax (Linum usitatissimum), arabidopsis (Arabidopsis thaliana), and rapeseed (Brassica napus), respectively [20, 22,23,24,25]. Based on sequence similarity, oleosins could be divided into five clades: the P clade, which represents the primitive one, is only found in green algae, mosses, and ferns; the U clade is universally present in all land plants; and, another three clades, i.e., SL, SH, and T, are organ-specific [19]. The SL clade represents low-molecular-weight peptides that are present in seeds of gymnosperms and angiosperms; the SH clade are high-molecular-weight peptides present in seeds of angiosperms; and, the T clade is tapetum-specific of the Brassicaceae lineage [19, 26]. Whereas LDs serve as an essential energy and carbon reserve for seed germination and seedling development, oleosins function in LD formation, stabilization, and degradation [27, 28]. An exciting fact is that oleosins are directly involved in regulating LD size and overexpression of oleosin genes could increase the seed oil content in arabidopsis [24, 25, 29, 30]. Moreover, a recent study revealed that strong artificial selection of GmOLEO1, which resulted in its high expression and increased seed oil accumulation in cultivated relatives, had occurred during soybean (Glycine max) domestication [31]. In castor bean, four oleosin genes have previously been described [23], and two of them, which represent major seed LD proteins of 14 and 16 kDa, respectively, have also been characterized via MALDI-MS and CID tandem MS [32]. In tung tree, mining transcriptome data resulted in five oleosin genes, which were shown to preferentially express in developing seeds relative to leaves and flowers [33]. Nevertheless, oleosin genes in other Euphorbiaceae plants and lineage-specific evolution of this special family have not been investigated. To address this issue, in the present study, we took advantage of available genome sequences and transcriptome datasets to identify the complete set of oleosin family genes in these Euphorbiaceae plants. Ortholog groups (OGs) and gene expansion patterns were inferred from phylogenetic, best-reciprocal-hit (BRH) BLAST as well as synteny analyses, whereas the evolutionary patterns were investigated based on the analysis of their gene structures, sequence characteristics, conserved motifs, and expression profiles.

Results

Identification, chromosome location, and synteny analysis of oleosin genes in six Euphorbiaceae plants

According to comparative genomics analyses, physic nut, tung tree, castor bean, and M. annua are typical diploid species that didn’t experience recent WGDs after the ancient so-called γ whole-genome triplication shared by core eudicots [2, 6, 9, 10]. In contrast to the fragmented status of genome assemblies in castor bean (25,763 scaffolds) [2], tung tree (20,614 scaffolds) [9], and M. annua (74,927 scaffolds) [10], the physic nut genome used in this study is mainly comprised of 6023 scaffolds and 81.7% of this assembly could be anchored onto 11 chromosomes (Chrs) based on genetic markers [6]. As shown in Table 1, a total of five oleosin family genes were identified from the physic nut genome, which were named JcOLE1–5 according to phylogenetic analysis (see below). The expression of these genes was all supported by RNA-seq reads as well as ESTs, which also allowed the extension of their transcription regions. Although JcOLE genes are distributed across five scaffolds, they were further anchored onto four pseudochromosomes with the help of the available genetic map, i.e., Chr3, Chr5, Chr8, and Chr11 (Fig. 1).

Table 1 Oleosin family genes identified in six Euphorbiaceae plants
Fig. 1
figure 1

Chromosomal locations and duplication events of Jc/MeOLE genes and their collinear genes in castor bean/tung tree/M. annua and rubber tree, respectively. Chromosome serial numbers are indicated at the top of each chromosome, and lines connect duplicate pairs located within syntenic blocks. Collinear genes in castor bean, M. annua, tung tree, and rubber tree are shown just behind that of physic nut and cassava, respectively

Genome mining of tung tree, castor bean, and M. annua also resulted in five oleosin genes each. Among them, the gene model of RcOLE3, which was not previously reported [23] and computationally predicted to encode 232 residues (29,794.m003372) [2], was manually optimized on the basis of RNA-seq reads (see Additional file S1). Like JcOLE2 and JcOLE5 that are closely located on Chr5, RcOLE2 and RcOLE5 are located on the same scaffold, implying a conservative evolution between physic nut and castor been. The hypothesis was further supported by synteny analysis, which revealed one-to-one collinear relationship between physic nut and castor been/tung tree/M. annua (Fig. 1).

Although more than one genome assemblies have been available for both cassava and rubber tree, results presented in this study are based on the most complete one: the rubber tree genome of Reyan7–33-97 consists of 7453 scaffolds spanning about 1.37 Gb [8], whereas the cassava genome of AM560–2 consists of 40,044 scaffolds spanning about 582 Mb [6]. Compared with the lack of a high density genetic map in rubber tree, 89.0% of the AM560–2 assembly could be further anchored onto 18 chromosomes on the basis of 22,403 markers available [6]. The search of the cassava genome resulted in eight oleosin-coding loci from seven chromosomes, i.e., Chr1, Chr5, Chr6, and Chr1417 (Table 1 and Fig. 1). For convenience, they were named MeOLE1a, MeOLE1b, MeOLE2a, MeOLE2b, MeOLE3, MeOLE4a, MeOLE4b, and MeOLE5, respectively. Although only a few ESTs have been available for MeOLE1a, the expression of other genes was all supported by RNA-seq reads, which also resulted in optimizing the gene model of MeOLE1a where an intron was mis-annotated (Table 1 and Additional file S2). The CDS sequences of three paralogous pairs, i.e. MeOLE1a/b, MeOLE2a/b, and MeOLE4a/b, exhibit a relatively high identity of 65.378.1% (Table 2). Since these gene pairs are located within syntenic blocks of duplicated chromosomes, they were defined as duplicates derived from the ρ WGD. Synteny analysis further supported one-to-one and one-to-two collinear relationships between physic nut and cassava (Fig. 1), corresponding to the occurrence of the recent ρ WGD and different evolutionary fates of WGD-derived duplicate pairs. Interestingly, close location of MeOLE2a and MeOLE5 on the same chromosome was also observed. Since they are located within syntenic blocks of cassava and physic nut but not that of cassava and poplar, Euphorbiaceae-specific chromosome rearrangement could be speculated after its divergence with Salicaceae.

Table 2 Oleosin duplicate pairs derived from the ρ WGD in cassava and rubber tree

In rubber tree, by contrast, the oleosin family was shown to be relatively complex, which includes eight expressed genes as well as three pseudogenes that are incomplete and without evidence for their expression (Table 1). These eight expressed HbOLE genes, which are distributed across seven scaffolds, exhibit one-to-one collinear relationship with that of cassava and thereby were named after their orthologs, i.e., HbOLE1a, HbOLE1b, HbOLE2a, HbOLE2b, HbOLE3, HbOLE4a, HbOLE4b, and HbOLE5 (Fig. 1). Among them, the CDS sequences of three paralogous pairs (i.e. HbOLE1a/b, HbOLE2a/b, and HbOLE4a/b) exhibit 84.988.5% identity, and the value is relatively bigger than their counterparts in cassava. Correspondingly, the Ks value of three rubber tree duplicate pairs varies from 0.2764 to 0.4095, which is relatively smaller than that in cassava (i.e. 0.44310.7135) (Table 2), implying a higher rate of gene evolution in the latter. In fact, relatively low Ks values of OLE duplicate pairs were also observed in another tree species poplar, varying from 0.1470 to 0.3486 (see Additional file 3). Except for PtOLE2a/PtOLE2b, duplicate pairs in cassava, rubber tree, and poplar possess a Ka/Ks ratio of less than 1 (from 0.1233 to 0.6223) (Table 2 and Additional file 3), suggesting that their divergence was mainly driven by purifying selection. Notably, HbOLE2a and HbOLE2b are located on the same scaffold, implying possible species-specific chromosome rearrangement after rubber tree-cassava divergence. However, due to the lack of a high density genetic map, we have no idea whether HbOLE2a/HbOLE2b and HbOLE5 are located on the same chromosome as observed in physic nut, castor bean, and cassava.

Phylogenetic analysis and definition of ortholog groups

Although the overall sequence similarity is low (see Additional file 4), 36 oleosins identified in Euphorbiaceae plants all harbor a single oleosin domain (110113 AA), which includes the highly conserved proline knot motif (Fig. 2). To ensure the reliability of phylogenetic analysis, oleosin domain sequences instead of the complete amino acids were used for unrooted tree construction, including five JcOLEs, five RcOLEs, five VfOLEs, five MaOLEs, eight MeOLEs, eight HbOLEs, nine PtOLEs, 17 AtOLEs, and six OsOLEs. As shown in Fig. 3, the tree assigned 68 oleosins into four main clades, i.e. U, SL, SH, and T as described before [19]. Except for T that is arabidopsis-specific, each species was shown to contain at least one member in each other clade. Moreover, both SL and SH have evolved to form two distinct groups in eudicots examined (see more in Fig. 4A). To confirm the result, the BRH method was also employed, which resulted in five ortholog groups, i.e. OG1, OG2a/2b, and OG3a/3b, corresponding to U, SL, and SH, respectively (Table 3). In physic nut, tung tree, castor bean, and M. annua that each harbor a single member in OG2a, OG2b, OG3a, and OG3b, OG2a/2b and OG3a/3b exhibit 60.166.2% and 61.769.4% sequence similarity, respectively, implying their recent origin. As for other species tested, species or even linage-specific gene expansion and/or loss were found: OG3a is absent from arabidopsis, whereas gene expansion was observed in OG1, OG2a, and OG3b; cassava and rubber tree exhibit same retention patterns, i.e. OG1, OG2a, and OG3a, which are somewhat different from poplar with the expansion of OG2a, OG2b, OG3a, and OG3b (Table 3), a species having experienced one Salicaceae-specific WGD at 60–65 Mya [34].

Fig. 2
figure 2

Multiple sequence alignment of oleosin proteins. Identical and similar amino acids are highlighted in black or dark grey, respectively. The SeqLogo of the 72-residue proline knot motif is shown above the alignment, and the PX5SPX3P pattern is underlined. The C-terminal AAPGA of Clade U and the putative C-terminal insertion of Clade SH are boxed

Fig. 3
figure 3

Phylogenetic analysis of oleosins in physic nut, tung tree, castor bean, M. annua, cassava, rubber tree, poplar, arabidopsis, and rice. Sequence alignment was performed using MUSCLE and the phylogenetic tree was constructed using bootstrap maximum likelihood tree (1000 replicates) method of MEGA6. Shown are bootstrap values at nodes supported by a posterior probability of ≥30%. The distance scale denotes the number of amino acid substitutions per site. The name of each clade is indicated next to the corresponding group

Fig. 4
figure 4

Structural and phylogenetic analysis of oleosin genes in physic nut, tung tree, castor bean, M. annua, cassava, and rubber tree. A Shown is an unrooted phylogenetic tree resulting from full-length oleosins with MEGA6. B Shown is the graphic representation of exon-intron structures displayed using GSDS. C Shown is the distribution of ten conserved motifs among oleosins, where different motifs are represented by different color blocks as indicated at the bottom of the figure and the same color block in different proteins indicates a certain motif

Table 3 Five OGs of the oleosin family based on analyzing nine representative species

Exon-intron structures, sequence features, and conserved motifs

As shown in Fig. 4B, the majority of 36 oleosin genes identified in this study don’t have introns in the coding region, whereas members in OG2a all harbor a phase 2 intron within the codon of the conserved R just after the proline knot motif and possess a classical GT-AG splice junction. Same exon-intron structure was also observed in poplar (except for a R → G variation in PtOLE2a), by contrast, all OsOLE genes are intronless and most AtOLE genes contain one to two introns except for intronless genes At-Sm1/2 and At-Sm3 in OG1 and OG2b, respectively (see Additional file 5). Compared with a similar length of coding sequences, species-specific insertion and deletion were frequently observed in the intron of OG2a members, which resulted in a variable intron length from 70 bp (MeOLE2a) to 272 bp (VfOLE2) (Fig. 4B).

Oleosins in examined Euphorbiaceae plants consist of 134–200 AA, and the average of 152 AA is comparable to 150 AA in poplar, 165 AA in rice, and 171 AA in arabidopsis (members in the T clade were excluding for their high variation from 106 to 543 AA, same for other physical and chemical parameters); the theoretical MW varies from 14.00 to 21.72 kDa, and the average of 16.15 kDa is similar to 15.43 kDa in poplar, 16.71 kDa in rice, and 18.43 kDa in arabidopsis (see Table 1 and Additional file 5). Without exception, all these proteins have a pI value of greater than 7, varying from 7.97 to 10.96, as well as a high AI value (88.44–119.57) and a GRAVY value of more than 0 (0.131–0.515), indicating their amphipathic property (Table 1). Notably, MeOLE1a has an extended N terminus relative to MeOLE1b and orthologs in other Euphorbiaceae plants, and RcOLE1 harbors a PX5GPX3P pattern instead of PX5SPX3P present in most oleosins. Compared with Clades U and SL, a putative fragment insertion was observed in the C-terminal of members in Clade SH, i.e. 18 AA for OG3a and 8 AA for OG3b with the exception of 4 AA for HbOLE5 (Fig. 2). Nevertheless, similar Kyte–Doolittle hydrophobicity plots were observed in all oleosins (Additional file 6).

Conserved motifs were also identified using MEME, which resulted in ten motifs with a range of 6–85 AA. Among them, Motifs 1, 3, and 2 are broadly distributed, which belong to the oleosin domain; Motifs 9 and 6 are OG1-specific, whose functions are unknown and the latter is characterized as a hallmark of the U clade; Motif 7 is widely distributed in OG2b, OG3a, and OG3b, whereas Motif 8 is present in OG2a, OG2b, and OG3b; Motif 5 is present in most members of OG3a and OG3b; Motif 4 is present in OG1 and OG3a, while Motif 10 is only found in OG3b. Species-specific gain or loss of certain motifs was also observed: MeOLE1a has gain one more copy of Motif 9 in its extended N terminus; MaOLE2 has lost Motif 8, whereas VfOLE3, HbOLE4b, MeOLE4a, JcOLE5, RcOLE5, and HbOLE5 have lost Motif 7; RcOLE4 has lost Motif 5, while RcOLE5 and MaOLE5 have lost Motif 10 (Fig. 4C).

Transcriptional profiling of oleosin genes in physic nut, castor bean, rubber tree, and cassava

To uncover the expression evolution of oleosin genes, various tissues and developmental stages were examined in physic nut, castor bean, rubber tree, and cassava, and results are presented in Fig. 5. In physic nut, four tissues (i.e. root, leaf, axillary bud, and seed) and seven stages of developmental seed were investigated. These seven stages, i.e., 14, 19, 25, 29, 35, 41, 45 days after pollination (DAP), were characterized as histodifferentiation, early increase of seed dry-weight, rapid increase of seed coat dry-weight, early increase of kernel dry-weight, rapid increase of kernel dry-weight, late kernel dry-weight increase, and desiccation, respectively. As expected, a considerably high abundance of total oleosin transcripts was observed in the latter four stages, coinciding with a rapid increase of oil. Nevertheless, transcripts of JcOLE1, JcOLE2, and JcOLE5 were lowly or barely detected, and JcOLE4 contributes more than 90% of total transcripts. By contrast, total transcripts in three early stages of developmental seed are comparable to that in other tissues, though an apparent tissue-specific expression profile was observed. In leaves, transcripts of JcOLE1 and JcOLE5 were barely detected, whereas JcOLE2 rarely expressed in seeds of 14 DAP. While JcOLE5 represents the most expressed gene in axillary buds, JcOLE3 contributes more than 80% of total transcripts in roots, leaves, and seeds of 14, 19, and 25 DAP. In contrast to the ubiquitous expression of JcOLE3 and JcOLE4 that peaked in seeds of 41 and 25 DAP, respectively, JcOLE1, JcOLE2, and JcOLE5 lowly expressed and exhibit a tissue or developmental stage-specific expression pattern (Fig. 5A).

Fig. 5
figure 5

Expression profiles of oleosin genes in physic nut, castor bean, rubber tree, and cassava. Color scale represents FPKM normalized log10 transformed counts where green indicates low expression and red indicates high expression

In castor bean, five tissues or developmental stages were examined, i.e., fully expanded true leaf, male flower, two stages of developmental seed (endosperm I/III and V/VI), and germinating seed. As shown in Fig. 5B, RcOLE genes predominantly expressed in endosperm I/III and V/VI, and the total transcripts are 420 and 630 folds more than that in leaves, respectively. By contrast, total transcripts in flowers and germinating seeds were relatively less abundant, which are just six and eight folds more than that in leaves. Whereas the transcript of RcOLE5 was barely detected in leaves and flowers, other genes were shown to ubiquitously express. RcOLE4 and RcOLE1 represent the most and second expressed genes in endosperm I/III and V/VI, respectively, whereas RcOLE2, RcOLE3, and RcOLE4 contribute the most transcripts in male flowers, leaves, and germinating seeds, respectively. Unlike JcOLE1, RcOLE1 also expressed in leaves, though the transcript level was relatively low (Fig. 5B).

In rubber tree, seven typical tissues were analyzed, i.e., root, leaf, bark, laticifer, female flower, male flower, and seed, and an apparent seed-predominant expression pattern of HbOLE genes was observed. Total transcripts in leaves, bark, and female flowers were shown to be very low, whereas three, five, and 130 folds more were detected in male flowers, roots, and seeds relative to leaves, respectively. All HbOLE genes were shown to express in seeds, and HbOLE2a contributes the most transcripts. As for other tissues, HbOLE1a represents the most expressed gene in roots and bark, whereas HbOLE3 and HbOLE5 contribute the most transcripts in leaves and male flowers, respectively (Fig. 5C). Interestingly, no oleosin transcripts were detected in the rubber-producing laticifer, though a high number of samples representing primary and secondary laticifers were mined.

In cassava, a total of 11 tissues were investigated, i.e., leaf blade, leaf mid-vein, petiole, stem, shoot apical meristem (SAM), lateral bud, root apical meristem (RAM), fibrous root, storage root, friable embryogenic callus (FEC), and somatic organized embryogenic structure (OES). As observed in above three species, MeOLE genes were lowly expressed in leaf, regardless of leaf blade, leaf mid-vein, or petiole. MeOLE1a contributes the most transcripts in leaf blade and mid-vein. The expression profile of MeOLE genes in petiole is similar to that of mid-vein, where MeOLE1a and MeOLE3 contribute most transcripts. The expression pattern of MeOLE genes in stem is similar to that of leaf blade, where MeOLE1a and MeOLE2b contribute most transcripts. MeOLE2b and MeOLE1a represent two major isoforms in lateral bud, fibrous root, and SAM, whereas MeOLE1a and MeOLE2b contribute most transcripts in storage root and RAM. By contrast, total transcripts in FEC and OES were relatively abundant, which are 28 and 13 folds more than that in leaf blade. MeOLE2b and MeOLE4b contribute most transcripts in FEC, whereas MeOLE2b represents the most expressed gene in OES. Overall, MeOLE1a and MeOLE2b seem to ubiquitously express, while most of other genes are tissue-specific (Fig. 5D).

Discussion

Increasing evidence supports that widespread WGDs have contributed much to the morphological and physiological diversity in angiosperms [35, 36]. As for two main clades of angiosperms, i.e., monocots and eudicots, it was proven that two WGDs termed τ and γ have played important roles in their diversification, respectively [37, 38]. Moreover, the model monocotyledonous plant rice experienced two additional WGDs named σ and ρ, whereas the model eudicotyledonous plant arabidopsis experienced two successive rounds of WGDs known as β and α, respectively [38, 39]. In the monocot clade, most gramineous plants that provide us food and/or industrial materials possess six oleosin isoforms, the most important structural protein of LDs [19]. By contrast, in the eudicot clade, the gene family was shown to be highly variable, from four members in Phaseolus vulgaris to 48 members in rapeseed [19, 25]. In arabidopsis, 17 members representing four clades have been described, i.e., U, SL, SH, and T [19, 22]. The seed-specific SL clade may originally evolve from the universal U clade, and subsequently evolved to form Clades SH and T [19]. The tapetum-specific T clade, which is only found in the Brassicaceae lineage thus far, occupies more than half of total AtOLE genes (52.94%). Comparative evolutionary analysis showed that both WGD and single gene duplication have contributed to the expansion of this special gene family, i.e., β WGD (2), α WGD (2), tandem (5), proximal (1), and transposed (1) duplication [35, 40]. Exactly, single gene duplication has driven the expansion of the T clade, whereas β and α WGDs contribute other clades. In rice, WGD (1) as well as transposed (1) and dispersed (1) duplication have contributed to the family expansion [35]. In the Euphorbiaceae lineage, the available genome sequences of several oil-bearing species with or without additional WGDs after the γ event, i.e., castor bean, physic nut, tung tree, M. annua, rubber tree, and cassava, provide a good chance to study lineage-specific evolution patterns in this important plant family.

The ρ WGD contributes to the expansion of the oleosin family in cassava and rubber tree

This study presents a first comparative evolutionary analysis of the oleosin family in Euphorbiaceae species. In castor bean, M. annua, physic nut, and tung tree, four species without recent WGDs, as expected, small numbers of five oleosin family genes were respectively identified. By contrast, relatively higher numbers of eight members were found in both cassava and rubber tree, which shared the recent ρ WGD [6]. Phylogenetic analysis divided these oleosins into three clades, i.e. U, SL, and SH, whereas the T clade reported in Brassicaceous plants was not found. Further homology analysis assigned them into five ortholog groups, where both SL and SH clades were shown to include two groups, i.e. OG2a/2b and OG3a/3b. They are more likely to be generated before the radiation of core eudicots, because of (1) relatively high sequence similarity of 60.169.4% and widely present in core eudicots (e.g. Carica papaya, Theobroma cacao, Cucumis sativus, Mimulus guttatus, and Solanum lycopersicum), (2) sharing same orthologs in rice as well as Amborella trichopoda and Aquilegia coerulea (Additional file 7). As we know, A. trichopoda represents a sole sister lineage to all other flowering plants [41], and without respective orthologs in this species as well as rice indicates that paralogs may result from duplication events sometime after monocot and eudicot divergence. Without respective orthologs in A. coerulea, a member of the early diverging eudicot clade, suggests related duplication events may occur in core eudicots, though the exact time needs further investigation. The typical feature of the U clade or OG1 is the presence of the conserved AAPGA motif at the C-terminal, whereas the SH clade is typical for the presence of putative C-terminal insertion and a relatively higher molecular weight. OG2a differs from OG2b for the presence of the intron immediately after the proline knot motif, whereas OG3b differs from OG3a for a relatively shorter fragment insertion (8 vs 18 AA).

Compared with the conservation in four Euphorbiaceae species without recent WGDs, synteny analysis revealed that the oleosin family in cassava and rubber tree has expanded along with the ρ WGD in OG1, OG2a, and OG3a. Despite exhibiting same retention patterns, pseudogenes that belong to OG2b and OG3b were only found in rubber tree. In gymnosperms, pseudogenes with apparently nonfunctional oleosin-coding sequences were also identified [19]. This is consistent with a slow genome evolution in long-lived woody perennials and the lack of an efficient elimination mechanism in rubber tree [8, 42]. In poplar and arabidopsis, WGDs also played a predominant role in the expansion of the oleosin family, however, evolutionary fates of these duplicated genes seem species-specific. OG2a is the sole group that reserved duplicates in all four examined species, i.e. cassava, rubber tree, poplar, and arabidopsis. In poplar, gene expansion was also found in OG2b, OG3a, and OG3b, while expansion of OG1 and OG3a was found in arabidopsis.

Structural divergence plays a role in the evolution of oleosin family genes in Euphorbiaceae

In addition to gene number variation, sequence and conserved motif analyses reveal structural divergence of members in different ortholog groups or even between paralogs. Compared with OG2b, the ancestor of OG2a may gain one intron sometime after their divergence. Gain or loss of certain motifs between orthologs or even paralogs as shown in Fig. 4 implies their possible functional divergence. A good example is MeOLE1a, which has gain an extended N terminus (39 AA) due to base mutation in the initial 5′ UTR of its encoding gene. The full-length CDS of MeOLE1a share 65.3, 68.8, and 65.3% sequence identity with MeOLE1b, HbOLE1a, and HbOLE1b, respectively, however, when the extended sequence was excluded, a considerably higher identity of 81.1, 85.4, and 81.1% was observed. This variation also resulted in higher values of molecular weight (21.72 vs 17.28 kDa) and pI (10.00 vs 9.72) but a relatively lower GRAVY value (0.451 vs 0.475). Nevertheless, the AI value and the Kyte–Doolittle hydrophobicity plot are not much changed (Additional file 6). Thereby, further characterization of the actual protein sequence and investigation of its subcellular localization are of particular interest.

Evolution of oleosin family genes was also associated with expression divergence

Expression divergence is also a key mechanism for duplicate pairs to perform same functions in different tissues or developmental stages [43]. Such studies have been reported in model plants. In arabidopsis, a study revealed that 73% of old duplicate pairs and 57% of recent duplicate pairs have diverged in expression [44]. In rice, Yim et al. (2009) found that 57.4% of 70 MYA duplicated genes and 50.9% of 7.7 MYA duplicated genes have diverged in expression [45]. Comparative analysis of genes encoding aquaporins, respiratory burst oxidase homologs, and Dof transcription factors also revealed expression divergence of paralogous pairs in cassava and rubber tree [13,14,15, 46]. In this study, similar results were also observed. In cassava, MeOLE2b and MeOLE1a have evolved to express ubiquitously, whereas transcripts of their paralogs MeOLE2a and MeOLE1b are usually low and exhibit a tissue-specific expression pattern, though MeOLE1b expressed more than MeOLE1a in FEC, implying possible neofunctionalisation. Compared with the rare expression of MeOLE4a in most tissues tested, MeOLE4b preferentially expressed in FEC and OES, implying possible neofunctionalisation or degeneracy. Like MeOLE1a, HbOLE1a also exhibits a ubiquitous expression pattern with the exception of laticifer, a rubber tree-specific tissue special for rubber biosynthesis and storage [12]. Based on their origin, laticifers could be divided into primary and secondary laticifers, which are derived from procambium and vascular cambium of tree trunk, respectively [47]. No matter what type, laticifers contain a large number of rubber particles that are surrounded by a monolayer of lipids with proteins such as rubber elongation factor (REF) and/or small rubber particle protein (SRPP). Like oleosins, REF and SRPP are two predominant proteins with a small molecular weight of 14.7 and 22.4 kDa in laticifer, respectively [8]. The high abundance of REF/SRPPs and the absence of oleosin transcripts support tissue or cell-specific evolution for specialized biological functions. Compared with HbOLE1a, in most tissues, the transcript level of HbOLE1b is usually lower but five folds more were observed in seed, implying possible subfunctionalisation. Unlike the ubiquitous expression of MeOLE2b, both HbOLE2b and HbOLE2a were shown to express in a few tissues tested, i.e. seed and female flower, and the transcript level of HbOLE2a is 27 folds more than that of HbOLE2b. Unlike JcOLE4 and RcOLE4 that expressed in all tissues examined, the transcripts of HbOLE4a and HbOLE4b were only detected in seed, where HbOLE4b were shown to express considerably more (about 15 folds). Additionally, despite the universal presence of the U clade, the expression level of genes in this group is usually low, however, RcOLE1 is highly abundant and represents the second most expressed isoform in endosperm I/III and V/VI; in contrast to the constitutive expression of JcOLE3 that contribute most transcripts in early stages of developmental seed, the transcripts of JcOLE4 and RcOLE4 significantly accumulate in latter stages of developmental seed. Thereby, further characterization of their promoters is of particular interest. In fact, OLE promoters from maize (Zea mays) and oil palm (Elaeis guineensis) have successfully been employed to drive key genes to increase oil production [48, 49].

Conclusions

Taken together, a genome-wide identification and comprehensive comparison of oleosin family genes were performed in representative Euphorbiaceae species, resulting in five to eight members representing three clades (i.e. U, SL, and SH) or five ortholog groups. In contrast to the high conservation in castor bean, physic nut, tung tree, M. annua, the family expansion observed in cassava and rubber tree was contributed by the recent ρ WGD and gene evolution was associated with both structure and expression divergence. These findings improved our knowledge on lineage-specific evolution of the oleosin family in Euphorbiaceae, which provides valuable information for further functional analysis and utilization of key members and their promoters.

Methods

Datasets and sequence retrieval

Arabidopsis, poplar, and rice oleosin family genes described before (see Additional file 5) were retrieved from TAIR11 (https://www.arabidopsis.org/), Phytozome v12 (https://phytozome.jgi.doe.gov/pz/portal.html), and RGAP7 (http://rice.plantbiology.msu.edu/), respectively. Genomic sequences of tung tree and M. annua were downloaded from NGDC (http://bigd.big.ac.cn/gsa) and OSF (https://osf.io/a9wjb/), respectively, whereas genomic sequences of cassava, castor bean, and other representative plants were accessed from Phytozome v12. mRNA sequences such as nucleotides, Sanger expressed sequence tags (ESTs), and RNA sequencing (RNA-seq) reads as well as genomic sequences of rubber tree and physic nut were accessed from NCBI (https://www.ncbi.nlm.nih.gov/).

Identification and manual curation of oleosin family genes

The oleosin domain profile (PF01277) retrieved from Pfam 33.1 (https://pfam.xfam.org/) was used for HMMER (v3.3, http://hmmer.janelia.org/) searches. Gene models of all candidates were manually curated with available mRNAs as described before [12]. To identify pseudogenes and/or gene fragments, the CDS sequences of candidates were further adopted for the BLASTN search [50] of target genome sequences. Presence of the oleosin domain in candidates was checked using MOTIF Search (https://www.genome.jp/tools/motif/), and their gene structures were displayed using GSDS2.0 (http://gsds.cbi.pku.edu.cn/).

Synteny analysis and definition of ortholog groups

Chromosomal locations of MeOLE genes were inferred from the genome annotation [6], while in physic nut, the linkage map with 1208 genetic markers [5] was employed for such purpose by using MAPchart 2.3 [51]. For synteny analysis, duplicate pairs were identified using the all-to-all BLASTp method, and gene colinearity was inferred using MCScanX [52]. Duplication modes such as tandem, proximal, transposed, dispersed, and WGD were defined as previously described [14, 15], and Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) of duplicate pairs were calculated using codeml [53]. Orthologs across different species were identified using the BRH method as well as information from synteny analysis, and ortholog groups were defined only when at least one member is found in at least two of species examined.

Sequence alignment and phylogenetic analysis

Protein multiple sequence alignment was carried out using MUSCLE (http://www.drive5.com/muscle/), and sequence alignment display was performed using Boxshade (https://embnet.vital-it.ch/software/BOX_form.html). Phylogenetic trees were constructed using MEGA 6.0 [54] with the following parameters: the maximum likelihood method, bootstrap of 1000 replicates, and substitution with the Jones-Taylor-Thornton (JTT) model.

Protein properties and conserved motif analysis

Protein properties were calculated using ProtParam (http://web.expasy.org/protparam/), which include the theoretical molecular weight (MW), isoelectric point (pI), aliphatic index (AI), and grand average of hydropathicity (GRAVY). Conserved motifs in oleosins were analyzed using MEME (https://meme-suite.org/meme/tools/meme) with parameters of any number of repetitions, maximum number of 10 motifs, and the width of 6 and 120 residues for each motif.

Gene expression analysis

Transcript levels of oleosin genes were investigated by using transcriptome datasets as shown in Additional file 8, where SRA experiments with seed samples were preferentially selected. As for cassava without seed samples, SRA experiments with most tissue samples were selected. Raw sequence reads in the FASTQ format were obtained using fastq-dump, and quality control was performed using Trimmomatic [55]. Read mapping was carried out using Bowtie 2 [56], and methods of FPKM (Fragments per kilobase of exon per million fragments mapped) and RPKM (Reads per kilobase per million mapped reads) were adopted to determinate relative transcript levels for pair-ended or single-ended samples, respectively [57]. Unless specified, the tools used in this study were performed with default parameters.

Availability of data and materials

The datasets analyzed during the current study are available in the NCBI SRA repository (https://www.ncbi.nlm.nih.gov/sra/) and detailed accession numbers can be found in Additional file 8.

References

  1. Xi Z, Ruhfel BR, Schaefer H, Amorim AM, Sugumaran M, Wurdack KJ, et al. Phylogenomics and a posteriori data partitioning resolve the Cretaceous angiosperm radiation Malpighiales. Proc Natl Acad Sci U S A. 2012;109(43):17519–24. https://doi.org/10.1073/pnas.1205818109.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Chan AP, Crabtree J, Zhao Q, Lorenzi H, Orvis J, Puiu D, et al. Draft genome sequence of the oilseed species Ricinus communis. Nat Biotechnol. 2010;28(9):951–6. https://doi.org/10.1038/nbt.1674.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sato S, Hirakawa H, Isobe S, Fukai E, Watanabe A, Kato M, et al. Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L. DNA Res. 2011;18(1):65–76. https://doi.org/10.1093/dnares/dsq030.

    Article  CAS  PubMed  Google Scholar 

  4. Rahman AY, Usharraj AO, Misra BB, Thottathil GP, Jayasekaran K, Feng Y, et al. Draft genome sequence of the rubber tree Hevea brasiliensis. BMC Genomics. 2013;14:75. https://doi.org/10.1186/1471-2164-14-75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang W, Feng B, Xiao J, Xia Z, Zhou X, Li P, et al. Cassava genome from a wild ancestor to cultivated varieties. Nat Commun. 2014;5:5110. https://doi.org/10.1038/ncomms6110.

    Article  CAS  PubMed  Google Scholar 

  6. Wu P, Zhou C, Cheng S, Wu Z, Lu W, Han J, et al. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant. Plant J. 2015;81(5):810–21. https://doi.org/10.1111/tpj.12761.

    Article  CAS  PubMed  Google Scholar 

  7. Bredeson JV, Lyons JB, Prochnik SE, Wu GA, Ha CM, Edsinger-Gonzales E, et al. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat Biotechnol. 2016;34(5):562–70. https://doi.org/10.1038/nbt.3535.

    Article  CAS  PubMed  Google Scholar 

  8. Tang C, Yang M, Fang Y, Luo Y, Gao S, Xiao X, et al. The rubber tree genome reveals new insights into rubber production and species adaptation. Nat Plants. 2016;2(6):16073. https://doi.org/10.1038/nplants.2016.73.

    Article  CAS  PubMed  Google Scholar 

  9. Cui P, Lin Q, Fang D, Zhang L, Li R, Cheng J, et al. Tung tree (Vernicia fordii, Hemsl.) genome and transcriptome sequencing reveals co-ordinate up-regulation of fatty acid β-oxidation and triacylglycerol biosynthesis pathways during eleostearic acid accumulation in seeds. Plant Cell Physiol. 2018;59(10):1990–2003. https://doi.org/10.1093/pcp/pcy117.

    Article  CAS  PubMed  Google Scholar 

  10. Veltsos P, Ridout KE, Toups MA, González-Martínez SC, Muyle A, et al. Early sex-chromosome evolution in the diploid dioecious plant Mercurialis annua. Genetics. 2019;212(3):815–35. https://doi.org/10.1534/genetics.119.302045.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Adepoju TF, Olatunji OM, Ibeh MA, Kamoru AS, Olatunbosun BE, Asuquo AJ. Heavea brasiliensis (Rubber seed): An alternative source of renewable energy. Scientific African. 2020;8:e00339. https://doi.org/10.1016/j.sciaf.2020.e00339.

    Article  Google Scholar 

  12. Zou Z, Gong J, An F, Xie G, Wang J, Mo Y, et al. Genome-wide identification of rubber tree (Hevea brasiliensis Muell. Arg.) aquaporin genes and their response to ethephon stimulation in the laticifer, a rubber-producing tissue. BMC Genomics. 2015;16:1001. https://doi.org/10.1186/s12864-015-2152-6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Zou Z, Yang L, Gong J, Mo Y, Wang J, Cao J, et al. Genome-wide identification of Jatropha curcas aquaporin genes and the comparative analysis provides insights into the gene family expansion and evolution in Hevea brasiliensis. Front Plant Sci. 2016;7:395. https://doi.org/10.3389/fpls.2016.00395.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Zou Z, Yang JH. Genomic analysis of Dof transcription factors in Hevea brasiliensis, a rubber-producing tree. Ind Crop Prod. 2019;134:271–83. https://doi.org/10.1016/j.indcrop.2019.04.013.

    Article  CAS  Google Scholar 

  15. Zou Z, Yang JH. Genome-wide comparison reveals divergence of cassava and rubber aquaporin family genes after the recent whole-genome duplication. BMC Genomics. 2019;20:380. https://doi.org/10.1186/s12864-019-5780-4.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Xu C, Shanklin J. Triacylglycerol metabolism, function, and accumulation in plant vegetative tissues. Annu Rev Plant Biol. 2016;67:179–206. https://doi.org/10.1146/annurev-arplant-043015-111641.

    Article  CAS  PubMed  Google Scholar 

  17. Frandsen GI, Mundy J, Tzen JT. Oil bodies and their associated proteins, oleosin and caleosin. Physiol Plant. 2001;112(3):301–7. https://doi.org/10.1034/j.1399-3054.2001.1120301.x.

    Article  CAS  PubMed  Google Scholar 

  18. Huang AH. Oleosins and oil bodies in seeds and other organs. Plant Physiol. 1996;110(4):1055–61. https://doi.org/10.1104/pp.110.4.1055.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Huang MD, Huang AH. Bioinformatics reveal five lineages of oleosins and the mechanism of lineage evolution related to structure/function from green algae to seed plants. Plant Physiol. 2015;169(1):453–70. https://doi.org/10.1104/pp.15.00634.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Liu Q, Sun Y, Su W, Yang J, Liu X, Wang Y, et al. Species-specific size expansion and molecular evolution of the oleosins in angiosperms. Gene. 2012;509(2):247–57. https://doi.org/10.1016/j.gene.2012.08.014.

    Article  CAS  PubMed  Google Scholar 

  21. Fang Y, Zhu RL, Mishler BD. Evolution of oleosin in land plants. PLoS One. 2014;9(8):e103806. https://doi.org/10.1371/journal.pone.0103806.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Kim HU, Hsieh K, Ratnayake C, Huang AH. A novel group of oleosins is present inside the pollen of Arabidopsis. J Biol Chem. 2002;277(25):22677–84. https://doi.org/10.1074/jbc.M109298200.

    Article  CAS  PubMed  Google Scholar 

  23. Hyun TK, Kumar D, Cho YY, Hyun HN, Kim JS. Computational identification and phylogenetic analysis of the oil-body structural proteins, oleosin and caleosin, in castor bean and flax. Gene. 2013;515(2):454–60. https://doi.org/10.1016/j.gene.2012.11.065.

    Article  CAS  PubMed  Google Scholar 

  24. Lu Y, Chi M, Li L, Li H, Noman M, Yang Y, et al. Genome-wide identification, expression profiling, and functional validation of oleosin gene family in Carthamus tinctorius L. Front Plant Sci. 2018;9:1393. https://doi.org/10.3389/fpls.2018.01393.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Chen K, Yin Y, Liu S, Guo Z, Zhang K, Liang Y, et al. Genome-wide identification and functional analysis of oleosin genes in Brassica napus L. BMC Plant Biol. 2019;19(1):294. https://doi.org/10.1186/s12870-019-1891-y.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Schein M, Yang Z, Mitchell-Olds T, Schmid KJ. Rapid evolution of a pollen-specific oleosin-like gene family from Arabidopsis thaliana and closely related species. Mol Biol Evol. 2004;21(4):659–69. https://doi.org/10.1093/molbev/msh059.

    Article  CAS  PubMed  Google Scholar 

  27. Shimada TL, Shimada T, Takahashi H, Fukao Y, Hara-Nishimura I. A novel role for oleosins in freezing tolerance of oilseeds in Arabidopsis thaliana. Plant J. 2008;55(5):798–809. https://doi.org/10.1111/j.1365-313X.2008.03553.x.

    Article  CAS  PubMed  Google Scholar 

  28. Shao Q, Liu X, Su T, Ma C, Wang P. New insights into the role of seed oil body proteins in metabolism and plant development. Front Plant Sci. 2019;10:1568. https://doi.org/10.3389/fpls.2019.01568.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Lu C, Fulda M, Wallis JG, Browse J. A high-throughput screen for genes from castor that boost hydroxy fatty acid accumulation in seed oils of transgenic Arabidopsis. Plant J. 2006;45(5):847–56. https://doi.org/10.1111/j.1365-313X.2005.02636.x.

    Article  CAS  PubMed  Google Scholar 

  30. Siloto RMP, Findlay K, Lopez VA, Yeung EC, Nykifork CL, Moloney MM. The accumulation of oleosins determines the size of seed oil bodies in Arabidopsis. Plant Cell. 2006;18:1961–74. https://doi.org/10.1105/tpc.106.041269.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhang D, Zhang H, Hu Z, Chu S, Yu K, Lv L, et al. Artificial selection on GmOLEO1 contributes to the increase in seed oil during soybean domestication. PLoS Genet. 2019;15(7):e1008267. https://doi.org/10.1371/journal.pgen.1008267.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Eastmond PJ. Cloning and characterization of the acid lipase from castor beans. J Biol Chem. 2004;279(44):45540–5. https://doi.org/10.1074/jbc.M408686200.

    Article  CAS  PubMed  Google Scholar 

  33. Cao H, Zhang L, Tan X, Long H, Shockey JM. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii). PLoS One. 2014;9(2):e88409. https://doi.org/10.1371/journal.pone.0088409.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313(5793):1596–604. https://doi.org/10.1126/science.1128691.

    Article  CAS  PubMed  Google Scholar 

  35. Qiao X, Li Q, Yin H, Qi K, Li L, Wang R, et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 2019;20(1):38. https://doi.org/10.1186/s13059-019-1650-2.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Wu S, Han B, Jiao Y. Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. Mol Plant. 2020;13(1):59–71. https://doi.org/10.1016/j.molp.2019.10.012.

    Article  CAS  PubMed  Google Scholar 

  37. Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers JE, McKain MR, McNeal J, et al. A genome triplication associated with early diversification of the core eudicots. Genome Biol. 2012;13(1):R3. https://doi.org/10.1186/gb-2012-13-1-r3.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell. 2014;26(7):2792–802. https://doi.org/10.1105/tpc.114.127597.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422(6930):433–8. https://doi.org/10.1038/nature01521.

    Article  CAS  PubMed  Google Scholar 

  40. Wang Y, Tan X, Paterson AH. Different patterns of gene structure divergence following gene duplication in Arabidopsis. BMC Genomics. 2013;14:652. https://doi.org/10.1186/1471-2164-14-652.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Amborella Genome Project. The Amborella genome and the evolution of flowering plants. Science. 2013;342(6165):1241089. https://doi.org/10.1126/science.1241089.

    Article  CAS  Google Scholar 

  42. Luo MC, You FM, Li P, Wang JR, Zhu T, Dandekar AM, et al. Synteny analysis in Rosids with a walnut physical map reveals slow genome evolution in long-lived woody perennials. BMC Genomics. 2015;16(1):707. https://doi.org/10.1186/s12864-015-1906-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Adams KL. Evolution of duplicate gene expression in polyploid and hybrid plants. J Hered. 2007;98(2):136–41. https://doi.org/10.1093/jhered/esl061.

    Article  CAS  PubMed  Google Scholar 

  44. Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004;16(7):1679–91. https://doi.org/10.1105/tpc.021410.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Yim WC, Lee BM, Jang CS. Expression diversity and evolutionary dynamics of rice duplicate genes. Mol Gen Genomics. 2009;281(5):483–93. https://doi.org/10.1007/s00438-009-0425-y.

    Article  CAS  Google Scholar 

  46. Zou Z, Yang JH, Zhang XC. Insights into genes encoding respiratory burst oxidase homologs (RBOHs) in rubber tree (Hevea brasiliensis Muell. Arg.). Ind Crop. Prod. 2019;128:126–39. https://doi.org/10.1016/j.indcrop.2018.11.005.

    Article  CAS  Google Scholar 

  47. Hao BZ, Wu JL. Laticifer differentiation in Hevea brasiliensis: induction by exogenous jasmonic acid and linolenic acid. Ann Bot. 2000;85:37–43. https://doi.org/10.1006/anbo.1999.0995.

    Article  CAS  Google Scholar 

  48. Shen B, Allen WB, Zheng P, Li C, Glassman K, Ranch J, et al. Expression of ZmLEC1 and ZmWRI1 increases seed oil production in maize. Plant Physiol. 2010;153(3):980–7. https://doi.org/10.1104/pp.110.157537.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Ye J, Wang C, Sun Y, Qu J, Mao H, Chua NH. Overexpression of a transcription factor increases lipid content in a woody perennial Jatropha curcas. Front Plant Sci. 2018;9:1479. https://doi.org/10.3389/fpls.2018.01479.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. https://doi.org/10.1093/nar/25.17.3389.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Voorrips RE. MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002;93(1):77–8. https://doi.org/10.1093/jhered/93.1.77.

    Article  CAS  PubMed  Google Scholar 

  52. Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40(7):e49. https://doi.org/10.1093/nar/gkr1293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–91. https://doi.org/10.1093/molbev/msm088.

    Article  CAS  PubMed  Google Scholar 

  54. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9. https://doi.org/10.1093/molbev/mst197.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. https://doi.org/10.1093/bioinformatics/btu170.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. https://doi.org/10.1038/nmeth.1923.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008;5(7):621–8. https://doi.org/10.1038/nmeth.1226.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors appreciate those contributors who make the related genome and transcriptome data accessible in public databases. They also thank two anonymous reviewers for their helpful suggestions.

Funding

This work was supported by the Natural Science Foundation of Hainan province (320RC705 and 319MS093), the National Natural Science Foundation of China (31971688), and the Central Public-interest Scientific Institution Basal Research Fund for Chinese Academy of Tropical Agricultural Sciences (1630052022001).

Author information

Authors and Affiliations

Authors

Contributions

The study was conceived and directed by ZZ. All the experiments and analysis were directed by ZZ and carried out by ZZ, YZ, and LZ. ZZ wrote the paper. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhi Zou.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declared that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

The gene model for RcOLE3. The coding region is marked with uppercase letters, above which are its deduced amino acids (the oleosin domain is shown in red). The start and stop codons are marked with bold letters.

Additional file 2.

The gene model for MeOLE1a. The coding region is marked with uppercase letters, above which are its deduced amino acids (the oleosin domain is shown in red). The start and stop codons are marked with bold letters.

Additional file 3.

Oleosin duplicate pairs in poplar.

Additional file 4.

Percent similarity between different oleosin family members in physic nut, castor bean, tung tree, M. annua, cassava, rubber tree, poplar, arabidopsis, and rice.

Additional file 5.

Detailed information of oleosin family genes present in arabidopsis, poplar, and rice. 1 Duplicated modes were determined based on the study of Qiao et al. (2019).

Additional file 6.

Kyte–Doolittle hydrophobicity plots of oleosins in physic nut, tung tree, castor bean, M. annua, cassava, and rubber tree.

Additional file 7.

Species-specific distribution of five oleosin OGs identified in this study. Orthologs across different species were identified using the BRH method, and systematic ortholog group names were assigned only when at least one member is found in at least two of species examined. Lineage-specific groups present in rubber and cassava are shown in bold. C. papaya, T. cacao, C. sativus, M. guttatus, and S. lycopersicum are other representatives of core eudicots, whereas A. coerulea, rice, and A. trichopoda were used as out-groups before divergence of OG2a/2b and OG3a/3b.

Additional file 8.

Detailed information of transcriptome data used in this study.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, Z., Zhao, Y. & Zhang, L. Genomic insights into lineage-specific evolution of the oleosin family in Euphorbiaceae. BMC Genomics 23, 178 (2022). https://doi.org/10.1186/s12864-022-08412-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12864-022-08412-z

Keywords