A genome-wide analysis of the phospholipid: diacylglycerol acyltransferase gene family in Gossypium

Background Cotton (Gossypium spp.) is the most important natural fiber crop worldwide, and cottonseed oil is its most important byproduct. Phospholipid: diacylglycerol acyltransferase (PDAT) is important in TAG biosynthesis, as it catalyzes the transfer of a fatty acyl moiety from the sn-2 position of a phospholipid to the sn-3 position of sn-1, 2-diacylglyerol to form triacylglycerol (TAG) and a lysophospholipid. However, little is known about the genes encoding PDATs involved in cottonseed oil biosynthesis. Results A comprehensive genome-wide analysis of G. hirsutum, G. barbadense, G. arboreum, and G. raimondii herein identified 12, 11, 6 and 6 PDATs, respectively. These genes were divided into 3 subfamilies, and a PDAT-like subfamily was identified in comparison with dicotyledonous Arabidopsis. All GhPDATs contained one or two LCAT domains at the C-terminus, while most GhPDATs contained a preserved single transmembrane region at the N-terminus. A chromosomal distribution analysis showed that the 12 GhPDAT genes in G. hirsutum were distributed in 10 chromosomes. However, none of the GhPDATs was co-localized with quantitative trait loci (QTL) for cottonseed oil content, suggesting that their sequence variations are not genetically associated with the natural variation in cottonseed oil content. Most GhPDATs were expressed during the cottonseed oil accumulation stage. Ectopic expression of GhPDAT1d increased Arabidopsis seed oil content. Conclusions Our comprehensive genome-wide analysis of the cotton PDAT gene family provides a foundation for further studies into the use of PDAT genes to increase cottonseed oil content through biotechnology. Electronic supplementary material The online version of this article (10.1186/s12864-019-5728-8) contains supplementary material, which is available to authorized users.


Background
Cotton, especially upland cotton, is the world's most important fiber crop, and oil is extracted from its oil-rich seeds. Indeed, cotton ranks sixth among the world's oil crops. Cottonseed oil makes up approximately 16% of the seed weight [1], and is the most valuable product derived from cotton seed. Cottonseed oil is typically composed of approximately 26% saturated palmitic acid (C16:0), 15% monounsaturated oleic acid (C18:1), and 58% polyunsaturated linoleic acid (C18:2) [2]. From 1999 to 2009, the world-wide consumption of vegetable oils increased by > 50% [3]. Therefore, research into the molecular mechanisms of oil biosynthesis and the development of new high-seed oil content cotton varieties using classical breeding techniques and biotechnological approaches is becoming increasingly important.
Triacylglycerols (TAGs) are major components of vegetable oils. The 3 pathways of DAG /TAG production with different FA compositions have previously been reviewed [4]. These pathways are de novo DAG/TAG synthesis (Kennedy pathway), acyl editing to provide PC-modified FA for de novo DAG/TAG synthesis, and PC-derived DAG/TAG synthesis. Phospholipid: diacylglycerol acyltransferase (PDAT) in the second pathway catalyzes the transfer of a fatty acyl moiety from the sn-2 position of a phospholipid to the sn-3 position of sn-1, 2-diacylglyerol, thus forming TAG and a lysophospholipid. PDAT enzyme activity was first identified in the use of phospholipids as acyl donors and DAG as an acceptor for TAG biosynthesis in yeast and plants [5].
Arabidopsis contains two PDAT genes, AtPDAT1 (At5g13640) and AtPDAT2 (At3g44830) [6]. No significant differences were found in total acyl composition or TAG content between 17-day-old AtPDAT-overexpressing and wild-type (WT) seedlings [6]. Additionally, the fatty acid content and composition of seeds also showed no significant difference in the pdat mutant versus WT [7]. However, in 5-week-old developing Arabidopsis leaves, the overexpression or knockout of AtPDAT1 in led to significant changes in fatty acid and TAG synthesis [8]. AtPDAT2 is highly expressed in seeds, but plays no role in TAG biosynthesis [6,9]. In castor bean, 3 PDAT genes have been identified [10]. The endoplasmic reticulum-located PDAT1-2 enhances hydroxy fatty acid accumulation in transgenic castor bean plants [11]. In flax (Linum usitatissimum), 6 PDATs have been identified (LuPDAT1, LuPDAT2, LuPDAT3, LuPDAT4, LuP-DAT5, and LuPDAT6) [12]. LuPDAT1/LuPDAT5 and LuPDAT2/LusPDAT4, but not LusPDAT3 or LusPDAT6, have the unique ability to preferentially channel a-linolenic acid into TAG. Recently, the PDAT gene Lro1 was shown to be responsible for hepatitis C virus core-induced lipid droplet formation in a yeast model system [13]. PDAT genes were also found in the unicellular green alga Chlamydomonas reinhardtii [14] and the bacterium Streptomyces coelicolor [15]. However, no mammalian counterpart has yet been found.
Previously, a genome-wide analysis of eudicots found 6 PDATs in Gossypium raimondii (two each in clades V, VI, and VII) [16]. To further understand the complexity of PDATs and TAG biosynthetic mechanisms in cotton, we performed a comprehensive genome-wide analysis of the PDAT gene family in cotton in the present study.
To interpret the relationship between AtPDAT1, AtP-DAT2, and cotton PDAT proteins, we constructed a phylogenetic tree (Fig. 1). This classified PDAT genes into 3 subfamilies; PDAT1, PDAT1-like, and PDAT2, corresponding to clades VI, V, and VII, respectively [16]. The sequence similarity between GhPDAT1-like and GhPDAT1 was higher than that of GhPDAT2 (Fig. 1). Based on the phylogenetic tree and sequence similarity analysis, we also analyzed orthologous PDAT gene pairs in G. hirsutum, G. barbadense, and their corresponding diploid ancestors (Table 1). Only one gene, GbPDAT1b-like, was not found or lost in G. barbadense. The PDAT gene name, gene identifier, gene pairs, and predicted properties of PDAT proteins are listed in Table 1.
Based on the sequenced genome sequence, cotton PDAT genes were physically mapped to chromosomes ( Fig. 2; Table 1). In G. hirsutum and G. barbadense, PDAT genes were uniformly distributed on the At and Dt chromosome, excluding one lost in G. barbadense. In G. hirsutum, 12 PDAT genes were located on 5 Dt chromosomes (D6, D7, D8, D9 and D13) and 5 At chromosomes (A6, A7, A8, A9 and A13). Two PDAT genes were located on both chromosome A6 and D6. Chromosomal localization data are listed in Fig. 2 and Table 1.

Protein domain analysis of PDATs in Gossypium hirsutum
To improve the comparison of protein domains among GhPDATs, the putative protein domains of 12 GhPDATs were predicted using the SMART database (http://smart. embl-heidelberg.de/). As shown in Fig. 3, a single transmembrane region in the N-terminus has been preserved in most GhPDATs, while all GhPDATs contain one or two LCAT domains at their C-termini.

Adaptive evolution analysis of the PDAT gene family
To explore which type of Darwinian selection determined the process of PDAT gene divergence after duplication, the Ka/Ks substitution ratio was used to assess the coding sequences of 12 pairs of PDAT gene family orthologs between G. hirsutum/G. barbadense and G. arboreum/ G. raimondii (Table 1). A Ka/Ks ratio > 1 represents positive selection, a ratio of 1 represents neutral evolution and a ratio < 1 represents purifying selection [21]. The Ka/Ks ratios of PDAT genes ranged from 0.575 to ∞ ( Table 2), indicating that the PDAT gene family had undergone purifying selection and positive selection in cotton. As shown in Table 2, the majority of PDAT genes had undergone positive selection, especially GbPDAT1b, GhPDAT1d, GbPDAT1d and GhPDAT2d. Only four PDAT genes GhPDAT1a, GbPDAT1a, GhPDAT1c and GbPDAT2b had undergone purifying selection.
Phylogenetic tree analysis showed that each AtPDAT gene corresponded to four PDAT genes in tetraploid cotton and two genes in diploid cotton. Therefore, the 12 GhPDATs were divided into 6 pair of duplicates, and the Ka/Ks ratio for each pair was calculated ( Table 3). All Ka/Ks ratios were < 1, suggesting that the PDAT genes from G. hirsutum have mainly experienced purifying selection pressure.

Expression profiles of PDAT genes in Gossypium hirsutum
To reveal the gene expression pattern for the GhPDAT genes identified, we analyzed the transcript profiles of PDAT genes in 22 cotton tissues ( Fig. 4) based on published TM-1 data [17]. GhPDAT1a and GhPDAT1b maintained a low expression level in 22 cotton tissues. GhPDAT1c and GhPDAT1d were highly expressed in the stem, leaf, and torus, and were also expressed in the ovule and fiber. GhPDAT1-like genes were expressed in 22 cotton tissues. AtPDAT2 was highly expressed in seeds, but plays no role in TAG biosynthesis [6,9]. GhPDAT2 was also highly expressed in 20 days post anthesis (DPA)-35 DPA ovules and 25 DPA fibers, and only marginally in other organs. This suggested that GhPDAT2 plays no role in TAG biosynthesis. Cottonseed oil mainly accumulates in the ovules after 15 DPA-20 DPA, at which stage, most of the GhPDATs were expressed. Therefore, GhPDATs may play a role in the biosynthesis of TAGs in developing cotton seeds.
To reveal the gene expression pattern for the GhPDAT genes identified, we analyzed their transcript profiles in our unpublished RNA-seq datasets. This was based on transcriptomic information for two upland BILs, i.e., 3012 vs. 3008 (with Gossypium barbadense germplasm introgression), with differing seed kernel oil contents of 25.88 and 33.52% (Additional file 1: Figure S1). There To determine if any GhPDATs were genetically associated with the cottonseed oil content, we performed co-localization analysis of GhPDATs with QTLs for seed oil content. QTLs were downloaded from the Cot-tonQTL database [22]. However, no PDAT gene was localized in the cottonseed oil QTL interval (data not shown).
Ectopic expression of GhPDAT1d increased the oil content of Arabidopsis seeds In PDAT1 clade, the expression level of GhPDAT1c and GhPDAT1d (gene pairs from the corresponding At and Dt subgenome) was higher in 15-20 DPA ovules than that of GhPDAT1a and GhPDAT1b (Figs. 4 and 5a). GhPDAT1d was thus selected for further functional  analysis. Transgenic Arabidopsis plants overexpressing GhPDAT1d were generated and used to characterize its biological functions in oil content. Relative expression levels of GhPDAT1d analyzed by qRT-PCR in transgenic Arabidopsis and WT plants showed that GhPDAT1d was highly expressed in the transgenic plants (Fig. 5b). No visible difference between transgenic Arabidopsis and WT plants was observed at different developmental stages (data not shown). In order to determine whether GhPDAT1d could increase the oil content, the oil contents of transgenic and WT plants were compared using an NMI20-Analyst nuclear magnetic resonance spectrometer (Niumag, Shanghai, China). Significantly increased oil content, 6.55 to 17.61% higher, was observed in transgenic line L2-L4 (Fig. 5c). There is no significant change in fatty acid compositions of WT and GhPDAT1d transgenic Arabidopsis seeds (Table 4).

Discussion
Despite the fact that many previous studies have revealed a crucial role for PDAT encoded products in TAG biosynthesis, our knowledge of PDATs in cotton remains limited. Therefore, this study aimed to present an overall picture of Gossypium PDATs, including their sequence variation, adaptive evolutionary analysis, protein domains, expression profiles and co-localization with QTLs.
The PDAT gene family in Gossypium PDAT genes exist in all plants, including algae, lowland plants (mosses and lycophytes) and highland plants (monocots and eudicots) [16]. This study revealed the details of 12 deduced PDATs from G. hirsutum, 11 deduced PDATs from G. barbadense, 6 deduced PDATs in G. arboretum and 6 deduced PDATs in G. raimondii. Evolutionary analysis previously showed that the PDAT gene family can be clearly divided into 7 major clades [16]. In the present study, Gossypium PDAT amino acid sequences were clustered into 3 clades (subfamilies), and the additional clade, PDAT1-like, was found in cotton. Clades I-IV were not found in cotton. This compares with Arabidopsis, in which only two PDAT genes (AtP-DAT1 and AtPDAT2) have been identified [6].
We observed that each AtPDAT gene corresponded to four PDAT genes in tetraploid cotton and two genes in diploid cotton. This suggested that PDAT gene duplication events occurred in diploid cotton before the emergence of tetraploid cotton, which is consistent with a previously reported eudicot-wide PDAT gene expansion [16]. Additionally, a single transmembrane region in the N-terminus has been preserved in most GhPDATs, and one or two LCAT domains were located at the C-terminus of all GhPDATs.

PDATs in relation to seed oil content
Cottonseed oil accumulates in ovules after 15-20 DPA. At this stage, most of the GhPDATs were expressed (Fig. 3), indicating that they play a role in the biosynthesis of TAGs in developing cotton seeds. Additionally, we found GhPDATs were expressed in developing fibers (Fig. 3), suggesting they are also involved in this stage of development. However, no PDAT gene was localized in the cottonseed oil QTL interval (data not shown).
In 5-week-old developing Arabidopsis leaves, the overexpression or knockout of AtPDAT1 led to significant changes in fatty acid and TAG synthesis [8]. Cottonseed  oil was widely believed to accumulate in ovules after 15 DPA. At this stage, most GhPDATs were found to be expressed (Fig. 4). In this study, we proved that ectopic expression of GhPDAT1d could increase the oil content of Arabidopsis seeds. Any fatty acid in the seed oil was found to be significantly changed as previously reported Arabidopsis pdat-ko mutant [7]. Together, these results implied that PDATs are conserved in upland cotton cultivars.

Conclusion
In conclusion, we performed a comprehensive genome-wide analysis of the PDAT gene family in cotton. A total of 35 PDAT genes were identified in four sequenced Gossypium species and grouped into 3 distinct clades. Ectopic expression of GhPDAT1d increased Arabidopsis seed oil content. Our detailed analysis of sequence variation, adaptive evolutionary analysis, protein domains, expression profiles, and QTL co-localization provides an important lead for further studies of PDAT genes in cotton.

Detection of protein domains
Potential transmembrane regions and functional motifs of GhPDAT proteins were identified using the SMART database (http://smart.embl-heidelberg.de/).

Ka and Ks calculations
PDAT gene pairs were used to calculate Ka and Ks using the DnaSP software of phylogenetic analysis by the maximum likelihood method.  Analysis of PDAT genes in RNA-seq data RNA-seq data of 22 cotton tissues were previously published (accession codes, SRA: PRJNA248163) [17]. Unpublished RNA-seq datasets were generated in our own laboratory using transcriptomic information for two upland BILs, i.e., 3012 vs. 3008 (with Gossypium barbadense germplasm introgression), with differing seed kernel oil contents of 25.88 and 33.52%. The expression of PDAT genes was analyzed based on these data.

Transgenic plant generation and expression analysis
Transgenic plant generation and expression analysis were performed as previously reported [24]. Briefly, complete coding sequence of GhPDAT1d (Additional file 4) was amplified with gene specific primers from G. hirsutum acc. TM-1. The resulting PCR product was cloned into a digested pBI121 vector with BamH I and Sac I using ClonExpress R II One Step Cloning Kit (Vazyme, Nanjing, China). Agrobacterium tumefaciens strain GV3101 containing the binary construct was used to transform Arabidopsis plants. We performed quantitative real-time PCR (qRT-PCR) to determine the expression pattern of GhPDAT1d, with t2 -ΔΔCt method used to quantify the expression level of GhPDAT1d relative to the 18S rRNA endogenous control. Primers are listed in Additional file 2: Table S1.

Oil content analysis
Total oil content was determined with about 0.3 g seeds per sample using an NMI20-Analyst nuclear magnetic resonance spectrometer (Niumag, Shanghai, China) as previously reported [24].

Fatty acid composition analysis
A gas chromatography/mass spectrometry GC/MS analysis was performed to determine the fatty acid compsitions using a gas chromatograph (7890A, Agilent Technologies, USA) equipped with a flame ionization detector (FID) and an HP-FFAP capillary column (30 m × 250 μm × 0.25 μm). WT and GhPDAT1d transgenic Arabidopsis seeds (about 100 seeds) were performed to determine the fatty acid components.