A comprehensive analysis of cotton VQ gene superfamily reveals their potential and extensive roles in regulating cotton abiotic stress

Background Valine-glutamine (VQ) motif-containing proteins play important roles in plant growth, development and abiotic stress response. For many plant species, the VQ genes have been identified and their functions have been described. However, little is known about the origin, evolution, and functions (and underlying mechanisms) of the VQ family genes in cotton. Results In this study, we comprehensively analyzed the characteristics of 268 VQ genes from four Gossypium genomes and found that the VQ proteins evolved into 10 clades, and each clade had a similar structural and conservative motif. The expansion of the VQ gene was mainly through segmental duplication, followed by dispersal. Expression analysis revealed that many GhVQs might play important roles in response to salt and drought stress, and GhVQ18 and GhVQ84 were highly expressed under PEG and salt stress. Further analysis showed that GhVQs were co-expressed with GhWRKY transcription factors (TFs), and microRNAs (miRNAs) could hybridize to their cis-regulatory elements. Conclusions The results in this study broaden our understanding of the VQ gene family in plants, and the analysis of the structure, conserved elements, and expression patterns of the VQs provide a solid foundation for exploring their specific functions in cotton responding to abiotic stresses. Our study provides significant insight into the potential functions of VQ genes in cotton. Supplementary Information Supplementary information accompanies this paper at 10.1186/s12864-020-07171-z.


Background
The VQ genes form a large gene family with important roles in growth, development and abiotic stress tolerance in plants [1][2][3]. The VQ proteins have a conserved VQ motif [F**hVQ*hTG (F, phenylalanine; *, any amino acid; h, hydrophobic residue; V, valine; Q, glutamine; T, tryptophan; G, glycine)] [4,5] and interact with WRKY TFs via the conserved residues V and Q. In Arabidopsis thanliana, many VQ genes have been reported to function in development and responses to abiotic stress. For example, AtVQ23 (sigma factor-binding protein 1, SIB1), and AtVQ16 (SIB2) were found to interact with AtWRKY33 to increase the resistance of Arabidopsis plants to Botrytis cinerea [6]. In another study, AtVQ16 and AtVQ23 have also been proven could interact with AtWRKY57, and AtVQ16 and AtVQ23 can enhance the competitions on AtWRKY57 to AtWRKKY33 in regulating JASMONATE ZIM-DOMAIN1 (JAZ1) and JAZ5 [7]. Moreover, JASMONATE-ASSOCIATED VQ MOTIF GENE1 (JAV1/AtVQ22) has been addressed to be as a key negative regulator of the jasmonate signalling [8]. For instance, AtVQ09 acts as a repressor of AtWRKY08 factor to modulate salt tolerance in Arabidopsis [9]. Recently, MdVQ10 and MdVQ15 were also described to interact with MdWRKY52 to regulate pathogen defense and development in apple (Malus domestica Borkh) [10]. In addition, OsVQ7 interacts with OsWRKY24 and play roles in NO signaling contributing to the tolerance of various stresses and development in rice (Oryza sativa) [11].

Identification and comparative analysis of VQs in plants
To identify VQ family in G. hirsutum, G. barbadense, G. raimondii and G. arboretum, the AtVQ proteins were used as query sequences to search against the protein databases of the four Gossypium species, and the VQdomain Pfam (PF05678) was also applied. In total, 89 GhVQs, 89 GbVQs, 45 GrVQs, and 45 GaVQs were identified and named in G. hirsutum, G. barbadense, G. raimondii and G. arboretum, respectively (Additional File 1, Supplemental Table S1). In addition, the physiological and biochemical properties of 268 VQs in Gossypium species were determined, including CDS length, GC count, isoelectric point (pI) and molecular weight (MW) (Additional File 1, Supplemental Table  S1). The CDS lengths of these Gossypium VQs ranged from 279 bp (GhVQ89 and GbVQ89) to 1443 bp (GbVQ15), the average GC content of the transcripts was 46.01, their exon numbers varied from 1 to 9, and only a small percentage of VQs contained introns (3.37% GhVQs, 3.37% GbVQs, 6.67% GaVQs, and 31.11% GrVQs). The pI values varied from 4.159 (GbVQ33 and GbVQ78) to 11.496 (GhVQ07) and the MW values ranged from 10.346 kDa (GbVQ89) to 52.058 kDa (GbVQ15) (Additional File 2, Supplemental Fig. S1).
To perform comparative genomic analyses, we searched another 11 plant species for VQ proteins. The evolutionary relationships of the species and the number of their VQ genes are shown in Fig. 1. The data revealed that the numbers of VQs in A. trichopoda, P. dactylifera, V. vinifera, P. trichocarpa, and T. cacao were less than those in the four Gossypium species (Additional File 3, Supplemental Table S2). The comparative structure analysis of VQs showed that almost all the VQs had a few introns and encoded relatively small proteins, and only 3 GhVQs, 3 GbVQs, 3 GaVQs and 14 GrVQs had more than one intron. We speculated that the WGD events that occurred during the evolution of angiosperms increased the numbers of the Gossypium VQs, and these events helped the VQs to gain new functions through neofunctionalization. However, the evolutionary forces that shaped the current intron/exon gene structures remain unknown.

Phylogenetic analysis of VQs
To explore the relationships among VQs in Gossypium, we conducted a phylogenetic analysis of the VQs from the 15 plant species (Fig. 2), and a phylogenetic tree between Gossypium spp. and Arabidopsis was also constructed (Additional File 4, Supplemental  Fig. S2). The tree contained 656 VQs. These proteins were divided into 10 clades based on the nomenclature of the Arabidopsis VQs. The largest group (Group I) contained 20 GhVQs, 20 GbVQs, 10 GaVQs, and 10 GrVQs. Group IX was the smallest group, including 4 GhVQs, 5 GbVQs, 3 GaVQs and 3 GrVQs. Previous research has verified that VQ proteins contain a conserved motif composed of F**hVQ*hTG [4,5]. In our study, among the 656 VQs, 212 proteins (in Group I and Group II) had the amino acid "M" next to "VQ" (simple M-VQ model); 159 proteins (in Group III, Group IV, and Group VII) had the amino acid "V" next to "VQ" (simple V-VQ model); and 285 proteins (in Group V, Group VI, Group VII, Group IX and Group X) had the amino acid "L" next to "VQ" (simple L-VQ model) (Additional File 5, Supplemental Fig. S3). The VQs with rarer amino acids of the Gossypium species were also scattered in Group I to Group X, and the clusters of VQs were similar to those in angiosperms [3].
Motif compositions and exon-intron structures of the VQs were shown in Fig. 4. Combining the phylogenetic data groups of the four Gossypium VQs, we
As previously described, duplication contributed to the expansion of genes in the polyploid events in plants [36]. The tetraploid species Gossypium of have undergone a genome duplication since their diverge from the two diploid species of Gossypium. In our study, we have identified the VQ duplication event, and the WGD/segmental event likely contributed to the expression regulation of VQs in Gossypium. The percentages of VQs derived from WGD were 60.47% in the At-subgenome of G. hirsutum, 63.04% in the Dt-subgenome of G. hirsutum, 69.04% in the At-subgenome of G. barbadense, 59.57% in the Dt-subgenome of G. barbadense, 62.22%  Table S4). Gene duplication events after the divergence of Gossypium species resulted in a high number of paralogous genes in both allotetraploid Gossypium species.
Prediction of miRNA target sites miRNA had been predicted to target the VQ genes in Arabidopsis [37,38] and tea [39]. To determine the miRNA-mediated post-transcriptional regulation of VQs in two allotetraploid species of Gossypium, we predicted the target sites of miRNAs in the coding (CDS) regions of the GhVQs and GbVQs. In G. hirsutum, 46 sites of 34 GhVQs were detected that could be targeted by 22 miR-NAs, while 46 sites of 32 GbVQs could be targeted by 21 miRNAs (Fig. 6 and Additional File 10, Supplemental Table S5). Of these, six VQ genes (GhVQ02, GhVQ40, GhVQ86, GbVQ02, GbVQ40 and GbVQ86) were predicted to be targeted by Ghr-miR172 in the CDS regions; and six VQ genes (GhVQ39, GhVQ52, GhVQ85, GbVQ39, GbVQ51 and GbVQ85) were targeted by Ghr-miR156 (Ghr-miR156a, Ghr-miR156b, Ghr-miR156c and Ghr-miR156d) at 10 prediction sites. Ghr-miR172 and Ghr-miR156 were reported to be involved in some biological processes, including the responses to developmental cues and abiotic stress in plants [40][41][42]. However, it requires further experiments to verify the regulation mechanism and functions of those predicted miRNAs and their targets in Gossypium.

Expression pattern analysis and function verification
Expression profiles of the VQs in the two allotetraploid kinds of Gossypium were analyzed with available transcriptome data (Additional Files 11 and 12, Supplemental Fig. S6 and S7 and Supplemental Table  S6). In this study, the GhVQs and GbVQs with average FPKM values > 5 and being present in at least two samples were identified as potentially expressed transcripts. Fifty-seven GhVQs and 39 GbVQs were selected, and their expression profiles tested in 10 tissues, including the anther, bract, filament, leaf, petal, pistil, root, sepal, stem and torus (Fig. 7). For 57 GhVQs, 22 genes were highly expressed in roots and GhVQ82 had the highest expression level, and 14 genes were highly expressed in leaves. There were only a few genes expressed in the anther, bract, filament, petal, pistil, sepal, stem and torus (Fig. 7a). For 39 GbVQs, 17 GbVQs were strongly highly expressed in roots, and 12 genes were strongly expressed in leaves (Fig. 7b). The different expression profiles of VQ genes suggest that they have different functions in distinct tissues and developmental stages.
VQs were widely identified in the abiotic stress responses in angiosperms [3,32]. In this study, the expression patterns of VQs in the allotetraploid Gossypium types under salt, drought, cold and heat stresses were analyzed using the published data. In total, 43 GhVQs and 37 GbVQs had different expression profiles under the four abiotic stress treatments (Fig. 8). Under salt stress, 29 GhVQs were significantly up-regulated at 12 h, and 21 GbVQs were up-regulated at 6 h ( Fig. 8a and c). Upon PEG treatment, most of the GhVQs and GbVQs were highly expressed at 12 h ( Fig. 8b and d). During cold stress, 33 GhVQs and 19 GbVQs were up-regulated at 24 h ( Fig. 8e and g). Most of the GhVQs and GbVQs under the hot treatment were highly expressed at 1 h ( Fig. 8f and h). To validate of the expression results of GhVQs in response to salt and drought stresses, we conducted qRT-PCR analyses of 12 GhVQs after treatments with PEG and salt treatment. In the presence of PEG, GhVQ08, GhVQ18, GhVQ62, GhVQ64, GhVQ80, and GhVQ84 had high expression levels at 48 h, while these GhVQs except GhVQ18 and GhVQ84, were highly expressed during 24-48 h under salt treatment (Fig. 9). The qRT-PCR results were slightly different from the RNA-seq data, but these findings suggest that some GhVQs were involved in plant response to drought and salt stresses.

Co-expression and interaction networks of GhVQs
To understand the putative roles of VQs in plant adaptation to drought and salt stresses, we conducted a coexpression analysis. Ten GhVQs were found to coexpress with another 227 functional genes ( Fig. 10 and Additional File 13, Supplemental Table S7). Among these, six and seven VQs were identified in different modules of drought stress and salt stress, respectively, while GhVQ37, GhVQ59 and GhVQ83 were detected coexisting during the two stress treatments. Moreover, these 227 genes co-expressing with 10 GhVQs, contained multiple TFs, including domain AP2, bHLH, F-box, GRAS, p450, PLB03212, WD40 and WRKY (Fig. 10a-e and Additional File 13, Supplemental Table S7). The functional regulation networks of the GhVQs were constructed using the website of STRING11.0 with the module reference of Arabidopsis association, and the results revealed that the GhVQs participated in plant defense interaction networks, including WRKYs, MYB15, MPK4, AR781, CSN5B and SIGAs (Fig. 10f). Indeed, VQ proteins could interact with WRKYs and other TFs to defend against abiotic stresses in cotton.

The expansion, duplication and structural characteristics of VQs in Gossypium
In this study, we analyzed the VQs of G. hirsutum, G. barbadense., G. raimondii, G. arboretum and another 11 plant species, and found that the number of VQs in the genomes of 15 species was inconsistent with related to the size of their genomes. There are 89 GhVQs, 89 GbVQs, 45 GaVQs, and 45 GrVQs, respectively. The number of Gossypium VQs was higher than that in cacao (27 TcVQs) and in Arabidopsis (34 AtVQs), but the number of GrVQs and GaVQs was fewer than the DzVQs (Fig. 1 and Additional File 2, Supplemental Table  S2). Previous studies have shown that diverse WGD events lead to the different sizes of plant genomes [45][46][47]. Our results indicated that VQs in these four Gossypium species were more likely to be proximal, tandem, and segmental genes, while the majority of VQs in rice [26] and Arabidopsis [24] are singleton genes. Through the analysis of the phylogenetic and structural features of the 15 plants VQ domains, the VQs could be divided into 10 clades. Group III could be expanded in the eudicots, particularly in the Mallow species (Fig. 2), while Groups III, IV, V, VI, VII, and X had no VvVQs, suggesting that these might have been lost in ancient genome duplication events.
Ten conserved motifs were also identified in the four Gossypium VQs. Motif 1 corresponded with the VQcontaining motif, which is widely found in angiosperms [3]. Previous studies have suggested that VQs have few introns in higher plants, being agreement with the results from Gossypium VQs. Only 3 GhVQs, 3 GbVQs, 14 GrVQs, and 3 GaVQs had multiple introns. Additionally, the motif compositions and intron contents of the Gossypium VQ proteins/genes in our study were consistent with the results of the phylogenetic analysis and the type

VQs play important roles in abiotic stress signaling pathways
Previous reports have shown that VQs are involved in various endogenous and environmental signals, which consistent with their diverse roles in plant development and in the response to abiotic stresses [2,5,19,24,26,32]. For example, AtVQ08, AtVQ14, AtVQ17, AtVQ18 and AtVQ22 are involved in modulation of seed development, chloroplast development, and plant growth. A proportion of the GmVQs [25], PeVQs [44], VvVQs [48], CsVQs [29], HaVQs [31], and NtVQs [19] also function in regulating the growth of different tissues at different developmental stages. In this study, most GhVQs and GbVQs were found to differentially expressed in the different tissues, including the ovule, fiber, anther, leaf, root, sepal and stem, suggesting that the VQs may play an important role in growth and development of Gossypium species (Fig. 7, Additional Files 6 and 7, Supplemental Fig. S6 and S7). Most VQs have been demonstrated to play important roles in responses to various abiotic stresses in plants [2,3,19,27,30,32]. In our work, we assessed the expression levels of the GhVQs and GbVQs under salt, drought, cold, and heat stresses, and found that the majority of the VQs were up-regulated under drought, salt, and cold stress, or down-regulated under heat stress. These findings were similar to those of previous reports in Arabidopsis [5], rice [26], maize [27] and cabbage [28]. Also, the promoters of the Gossypium VQs, many cis-elements that were reported to exist in other abiotic stresses responsive genes were detected (Fig. 3), implying that the VQs in Gossypium are likely involved in response to various abiotic stresses, and that the response mechanisms maybe complex and diverse.
VQs has been reported to interact with WRKY TFs and to regulate a variety of physiological and biochemical processes and abiotic stress responses [6,9,11,49,50]. Here, by constructing co-expression and an Arabidopsis associated model, multiple Gossypium VQs, such as GhVQ37, GhVQ59, and GhVQ83, were predicted to interact with different WRKY TFs, implying that VQs act in stress tolerance through interacting with WRKYs. Moreover, we predicted some putative target sites of microRNAs in the Gossypium VQs. These microRNAs included miRNA156s and miR172, which serve important roles in various life processes of plants [40][41][42]. These results indicate that the Gossypium VQs are extensively involved in growth, development, and in response to stresses, and work together with WRKYs and microRNAs during these processes.

Conclusions
In this study, using bioinformatics plus expression profiles, we identified and presented the structure, phylogenetic relationships, and tissue specificity of VQ family genes in four Gossypium species. Our data showed that the gene structure and motif coding regions were conserved across plants, and segmental, dispersed, and tandem duplications were the main reason for the expansion of the VQs. Cis-element and expression analyses indicated that the majority of VQs were activated in response to abiotic stress, and some of VQs were coexpressed with WRKYs and hybridized with the miRNAs involved in Gossypium growth, development, and abiotic stress. Our study could serve as a foundation for future exploration of the specific functions of Gossypium VQs in the abiotic stress responses and the interactions with WRKYs or microRNAs.

Analysis of chromosome location and gene regulatory elements
The chromosomal positions of all VQ members were determined using the gene transfer (gtf) format files of the reference genomes. The exon/intron structure of VQs were also extracted from the gtf files and displayed by the GSDS platform (http://gsds.cbi.pku.edu.cn/) [63]. Then, the MEME tool (http://meme-suite.org/) (the motif with 10 amino acids in length and E-value less than 1e − 40 ) [64] was used to detect the additional motifs of the proteins. With the combination of TBtools software (v 0.67361) [65], all domain motifs were compared among VQ genes to identify the group-specific signatures.
Approximately 2000 bp genomic sequences locating upstream (from − 2000 to − 1) of the VQs from start codons were extracted from the cotton genome, which was subsequently submitted to the online PlantCARE (http:// bioinformatics.psb.ugent.be/webtools/plantcare/html/) [66] to determine the distribution of plant cis-acting regulatory elements. Moreover, the miRNA targets of the VQ members were predicted using the coding sequences (CDS) regions for complementary sequences by the psRNATarget server (http://plantgrn.noble.org/ psRNATarget/analysis?function=2) [67] with default parameters, except maximum expectation (E) = 3.5. A total of 80 published miRNAs of G. hirsutum were selected.

Plant cultivation, RNA isolation, and RT-PCR analysis
The germinated TM-1 cotton seeds were grown in plastic pots filled with the mixture of soil vermiculite, and the artificial growth conditions were set at 28/22°C, with a photoperiod of 16 h light/8 h darkness. Plants were separately subjected to 400 mM PEG or 400 mM NaCl. Three biological replicates were sampled at 0, 1, 3, 6, 12, 24, 36, or 72 h. All the samples were collected and frozen in liquid nitrogen, which was stored at − 80°C until total RNA extraction.
Total RNAs of the above samples were isolated using the RNA prep Pure Plant Kit (Polysaccharides & Polyphenolics-rich, DP441) (TIANGEN, Beijing, China).
The concentrations and integrities of the extracted RNA samples were measured and verified using a NanoDrop machine and 1% agarose gel electrophoresis, and the RNA samples were reversed transcribed into complementary DNA (cDNA) using the Mir-X TM MIRNA First-Strand Synthesis Kit (TaKaRa, Dalian, China). The qRT-PCR was performed using the Roche LightCycler 480 System (Roche, Germany). The qRT-PCR primers for the GhVQs and actin gene were listed in Supplemental Table S8. The reaction was set up in a total volume of 20 μL: 2 μL (200 ng) of cDNA, 0.4 μL of forwarding primer (10.0 μM), 0.4 μL of reverse primer (10.0 μM), 10 μL 2 × TransStart Top/Tip Green qPCR SuperMix, and 7.2 μL of nuclease-free water. The reaction procedure was completed under the following program: 94°C for 30 s; 45 cycles of 94°C for 5 s, 60°C for 15 s, 72°C for 10 s; and 4°C to finish. The results were calculated using the 2 -ΔΔCt relative quantitative method.