Transcriptome profiling of developing leaf and shoot apices to reveal the molecular mechanism and co-expression genes responsible for the wheat heading date

Background Wheat is one of the most widely planted crops worldwide. The heading date is important for wheat environmental adaptability, as it not only controls flowering time but also determines the yield component in terms of grain number per spike. Results In this research, homozygous genotypes with early and late heading dates derived from backcrossed progeny were selected to conduct RNA-Seq analysis at the double ridge stage (W2.0) and androgynous primordium differentiation stage (W3.5) of the leaf and apical meristem, respectively. In total, 18,352 differentially expressed genes (DEGs) were identified, many of which are strongly associated with wheat heading date genes. Gene Ontology (GO) enrichment analysis revealed that carbohydrate metabolism, trehalose metabolic process, photosynthesis, and light reaction are closely related to the flowering time regulation pathway. Based on MapMan metabolic analysis, the DEGs are mainly involved in the light reaction, hormone signaling, lipid metabolism, secondary metabolism, and nucleotide synthesis. In addition, 1,225 DEGs were annotated to 45 transcription factor gene families, including LFY, SBP, and MADS-box transcription factors closely related to flowering time. Weighted gene co-expression network analysis (WGCNA) showed that 16, 336, 446, and 124 DEGs have biological connections with Vrn1-5 A, Vrn3-7B, Ppd-1D, and WSOC1, respectively. Furthermore, TraesCS2D02G181400 encodes a MADS-MIKC transcription factor and is co-expressed with Vrn1-5 A, which indicates that this gene may be related to flowering time. Conclusions RNA-Seq analysis provided transcriptome data for the wheat heading date at key flower development stages of double ridge (W2.0) and androgynous primordium differentiation (W3.5). Based on the DEGs identified, co-expression networks of key flowering time genes in Vrn1-5 A, Vrn3-7B, WSOC1, and Ppd-1D were established. Moreover, we discovered a potential candidate flowering time gene, TraesCS2D02G181400. Taken together, these results serve as a foundation for further study on the regulatory mechanism of the wheat heading date. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07797-7.


Background
Wheat (Triticum aestivum L.) is one of the most important and widely distributed food crops. The wide global planting area of bread wheat is attributed to its high adaptability to the natural environment. The heading date or flowering time is an important adaptive trait for crop genetic breeding.
Flower development in higher plants mainly occurs in three stages: the floral induction phase, the floral primordia phase, and the floral organ development phase. Double ridge and androgynous primordium differentiation are key stages for flower induction and floral meristem development in wheat. At the double ridge stage (W1.5-W2.0), spikelet protuberance appears from the base of the growth cone between axils of bract primordium and determines the number of spikelets on each spike; at the differentiation stage of the androgynous primordium, the first floret primordium at the base of the spikelet begins to differentiate, and the stem begins to extend (W3.5) [16]. Thus, to explore the molecular mechanism underlying heading date, we performed RNA-Seq analysis of two wheat accessions with a significant difference in heading date at W2.0 and W3. 5.
With the completion of hexaploid bread wheat reference genomes (IWGSC, 2018), the use of bioinformatics methods to discover flowering time genes and to examine molecular characterization has significantly accelerated the study of wheat flowering time. Indeed, more than nine hundred flowering time genes have been identified in wheat by BLAST searches and OrthoMCL clustering methods [17]. Moreover, researchers have discovered eighty-four SNPs that are highly related to spike numbers based on GWAS [18]. Using the bulked segregant RNA-Seq method, a large deletion in the first intron of Vrn-B1 causing heading date variation was identified, and some flowering time GO terms were identified [19].
However, wheat is a hexaploid species and has a complex genome, which poses a challenge for discovering flowering time genes and revealing their genetic characteristics. In this study, homozygous genotypes with an early heading date (WHd) and late heading date (MHd) were selected for RNA-SEq. We aimed to investigate transcriptional regulation at the two key flower development stages of double ridge and androgynous primordium differentiation and identify the important flowering time genes using weighted gene co-expression network construction. We anticipate that transcriptome analysis will allow for exploring the molecular characteristics of wheat flowering time and promoting wheat improvement.

Identification of differentially expressed genes
To identify genes that differed significantly between early and late heading genotypes during the development of young panicles, we analyzed the differentially expressed genes (DEGs) with strict quality control. The DEGs of L2.0, L3.5, A2.0, and A3.5 were compared between genotypes with an early heading time (WHd) and a late heading time (MHd). Finally, we identified 18,325 unique DEGs (Fig. 1a, Table S2), which are mainly distributed chromosome arm ends (Fig. S2). Specifically, 12,941 DEGs for the apical meristem at A2.0 were detected, 9,990 of which were upregulated and 2,951 downregulated (  (Fig. S3d), were also identified. The flowering time genes Vrn1-5 A [3], Vrn3-7B [5], Ppd-1D [9,10], and WSOC1 [14] were found to be differentially expressed. We also screened common differentially expressed genes at different developmental stages in the same tissue and observed that 1,084 genes were common DEGs at A2.0 and A3.5 (Fig. 1b), with 215 genes being common DEGs at L2.0 and L3.5 (Fig. 1c). We speculate that these genes may play important roles in spikelet development at the W2.0 and W3.5 stages. Moreover, nine genes from the above DEGs were selected for qRT-PCR analysis ( Fig. 2a and b), and the FPKM values of the nine selected DEGs are shown in Table S7. Overall, the qRT-PCR expression trend of the nine DEGs agreed with the RNA-Seq data ( Fig. 2c and d, Table S7).

MapMan metabolic pathway analysis of differentially expressed genes
The metabolic pathways of four comparative groups of DEGs were visualized by MapMan software (Fig. 4a  Transcription factor classification and identification To identify transcription factors (TFs) involved in the heading time development process, the DEGs were subjected to transcription factor analysis using iTAK software. According to the criteria of the plant transcription factor database (http://planttfdb.gao-lab.org/), a total of 1,225 transcription factors were classified into 45 transcription factor families (  (Table S5). Among them, we also identified several transcription factor families involved in the plant flowering process, including three LFY transcription factors [23], eighteen SBP transcription factors, thirty-six MADS-MIKC transcription factors and ten MADS-M-type transcription factors (Table S5). Gene expression level analysis showed that transcription factors are expressed at the critical period that determines flowering time. For example, we found three LFY transcription factor genes (TraesCS2A02G443100, TraesCS2B02G464200, TraesCS2D02G442200) to be highly expressed in the apical meristem, especially in WHd-A3.5 and WHd-A2.0 (Fig. 5a). Most SBP transcription factor genes were highly expressed in WHd-A2.0 and WHd-A3.5 (Fig. 5b). For the MADS-box gene family, the TraesCS5A02G391700 (Vrn1-5 A) gene always showed a high expression level in both wild and mutant wheat, TraesCS3B02G612600 was highly expressed in leaf tissue, and two genes (TraesCS3D02G284200, TraesCS3A02G284400) were highly expressed in the apical meristem (Fig. 5c).

Construction of the flowering gene regulatory network
A total of 18,352 DEGs were used to construct a coexpression network. We calculated the average gene connectivity under different soft-thresholding powers and found that when β = 14, the co-expression network had scale-free characteristics ( Fig. S10a; b; c; d). Finally, 17 co-expression modules were obtained by the "dynamic tree cut" method. Each branch in the cluster tree represents a gene set, and different modules are distinguished by different colors (Fig. S10e). The correlation value between the co-expression modules and samples was also calculated (Fig. S11). The reported flowering time genes Vrn1-5 A (TraesCS5A02G391700), Vrn3-7B (TraesCS7B02G013100), Ppd-1D (TraesCS2D02G079 600), and WSOC1 (TraesCS4D02G341700) were selected to build a gene co-expression network, and genes connected to the reported flowering time genes were extracted from the co-expression modules. We detected 16, 336, 446 and 124 DEGs with biological connections to Vrn1-5 A (Fig. 6a), Vrn3-7B (Fig. S12a), Ppd-1D (Fig.  S13a), and WSOC1 (Fig. S14a), respectively. The complete gene list related to the reported flowering time genes is summarized in Table S6.

Discussion
Flowering time is an important agronomic trait that is regulated by many genes.  TraesCS6D02G142100  TraesCS2D02G502900  TraesCS7B02G158500  TraesCS3A02G432500  TraesCS1A02G255300  TraesCS1B02G266100  TraesCS2B02G250900  TraesCS2A02G232400  TraesCS2D02G232800  TraesCS7A02G246500  TraesCS7D02G245200  TraesCS2B02G432700  TraesCS2D02G410700  TraesCS5A02G265900  TraesCS5B02G265600 TraesCS5D02G273900 TraesCS7B02G144900  TraesCS5A02G391700  TraesCS3A02G284400  TraesCS3D02G284200  TraesCS2D02G181400  TraesCS2A02G261200  TraesCS2D02G262700  TraesCS2B02G200800  TraesCS2B02G281000  TraesCS3D02G427700  TraesCS4D02G341700  TraesCS5A02G515500  TraesCS3B02G612600  TraesCS5A02G391800  TraesCS5B02G396700  TraesCS5D02G401700  TraesCS3A02G434900  TraesCS3B02G318300  TraesCS7D02G120500  TraesCS3B02G469700  TraesCS4D02G243700  TraesCS4D02G276100  TraesCS3A02G314300  TraesCS4D02G245200  TraesCS1B02G275000  TraesCS3B02G470000  TraesCS7A02G383800  TraesCS7D02G380300  TraesCS3A02G432900  TraesCS6D02G273500  TraesCS6A02G292300  TraesCS6B02G322700  TraesCS3D02G090300  TraesCS3A02G090900  TraesCS3B02G105700  TraesCS3D02G513800  TraesCS3B02G155200  TraesCS7A02G260900  TraesCS6A02G158100  TraesCS7B02G377700  TraesCS4D02G301100 TraesCS2A02G311100 TraesCS2D02G309300 there are few studies on transcriptome analysis of heading date at W2.0 and W3.5 and few studies on coexpression networks construction of flowering time genes in wheat. In our study, we not only found known key heading date genes to be differentially expressed, such as Vrn1-5 A, Vrn3-7B, Ppd-1D, and WSOC1 [3,9,14], but we also identified new DEGs involved in the process. For example, GO:0005975 (carbohydrate metabolism) was enriched at A2.0 and L2.0, and GO:0005991 (trehalose metabolic process) was enriched at A2.0 and L3.5. A previous study reported that carbohydrates function in the regulation of plant flowering. For example, trehalose-6-phosphate (t6p) is suggested to be an indicator for measuring the carbohydrate status in plants [20,21], and GO:0019684 (photosynthesis, light reaction), which is important for photoperiod pathways [22], is enriched only at L3.5. Gene function annotation revealed that TraesCS1A02G339300 and TraesCS1B02G351600 both encode trehalose-6phosphate (t6p) (Table S2), and through MapMan metabolic pathway analysis, we discovered that they are involved in trehalose metabolism (bin:3.2.1). The t6p is an important endogenous signal that can influence the flowering time of Arabidopsis thaliana in both apical shoot and leaves [20]; thus, we speculate that TraesCS1A02G339300 and TraesCS1B02G351600 can be used to reveal the molecular mechanism by which of t6p affects flowering in wheat. Furthermore, functional annotation showed that TraesC-S3A02G400000 is a pfkB-like carbohydrate kinase family protein (Table S2); according to MapMan analysis, it is related to the major CHO metabolismdegradation-sucrose-fructokinase (bin:2.2.1.1) category. Fig. 6 Construction of the flowering time regulatory network and expression level of differential genes. The red nodes in the network indicate high-confidence genes involved in the heading date. a The Vrn1-5 A flowering time regulatory network. b The expression heatmap of differentially expressed genes co-expressed with Vrn1-5 A. c The expression pattern of differentially expressed genes co-expressed with Vrn1-5 A. The red filled columns represent upregulated genes, and the blue filled columns represent downregulated genes Transcription factors play important roles in the regulation of plant flowering stage development. Our results showed that LFY, SBP, and MADS-box genes tended to be highly expressed in the apical meristem of wild type wheat (WHd-A2.0, WHd-A3.5). For example, TraesC-S2A02G443100 is an LFY transcription factor. The homologous gene in rice is Os04g0598300, which mainly controls the heading date and determines the development of branches and tillers in the ear, and overexpression of Os04g0598300 can promote the heading date in rice [24]. In addition, the SBP transcription factor, TraesCS7A02G246500, was highly expressed at the WHd-A2.0 and WHd-A3.5 stages; its ortholog gene in Arabidopsis is ATSPL9 (AT2G42200). Previous research showed that ATSPL9 can regulate flowering time by promoting the transcription of MADS-box genes, including FUL (FRUITFULL), SOC1 (SUPPRESSOR OF OVEREX-PRESSION OF CONSTANS 1), and AGL42 (AGA-MOUS-LIKE) [25,26]. TraesCS3A02G284400 is a MIKC-type MADS-box gene. Its orthologous gene in rice OsMADS32 is uniquely expressed in the early stage of the inflorescence meristem, e.g., spikelet primordium, maintaining flower organ characteristics and regulating flower development of rice flowers [27,28]. Furthermore, we discovered that some MADS-type transcription factors were highly expressed in the leaves of both wildtype and mutant wheat, such as Vrn1-5 A (TraesC-S5A02G391700), which indicates that this type of transcription factor can regulate flowering through different tissues, such as the leaf and apical meristem.
Moreover, many TFs, including WRKY, bZIP, bHLH, HSF, and MADS-box transcription factors, were involved in the flowering time gene co-expression network, was constructed by WGCNA (Table S6). In the Vrn1-5 A co-expression network, the MADS-MIKC transcription factor gene TraesCS2D02G181400 was differentially expressed in A2.0 and L3.5 ( Fig. 2a and b); its homolog in rice is OsMADS18. It is an AP1/FUL-like MADS-box gene that determines the formation of inflorescence meristem characteristics by interacting with PAP2 in the floral meristem [29]. We also discovered one basic helix-loop-helix (bHLH) transcription factor (TraesCS5D02G386500). Its homologous gene in Arabidopsis is AT2G43010, which negatively regulates the red light response mediated by phyB. Thus, we speculate that red light might affect expression of Vrn1-5 A. In addition, TraesCS6A02G146500 (Fig. 2) encodes glucose-6-phosphate 1-dehydrogenase (G6DPH), providing energy and other metabolites for plant growth and development; we suggest that expression of vernalization genes requires other genes to provide energy. The Vrn3-7B gene co-expression network (Fig. S12a) also included four bZIP transcription factors (Table S6). Two CONSTANS-like proteins (TraesCS5A02G166100, TraesCS5D02G170700) were also discovered. The homolog of both TraesCS5A02G166100 and TraesCS5D02G170700 in Arabidopsis is AT5G15840, a zinc finger transcription factor-like protein that acts upstream of FT and SOC1 and is mainly involved in flowering regulation under long-day conditions [30]. In the Ppd-1D co-expression network (Fig. S13a), identified transcription factors mainly comprised WRKY, bHLH, and HSF. We also identified one CONSTANS-like protein gene, TraesCS1A02G220300; its homolog gene in Arabidopsis is AT5G57660, which is involved in the flowering development process [31]. Some WRKY and MADS-box transcription factors were also identified in the WSOC1 co-expression network (Fig. S14a). We detected three flowering locus T genes (TraesC-S3A02G143100, TraesCS3B02G162000, TraesCS3D02G1 44500), which is consistent with previous reports noting that gibberellin can activate expression of Flowering Locus T [14]. Based on the above analysis, we suggest that the newly identified genes (TraesCS2D02G181400, TraesCS5D02G386500, TraesCS6A02G146500, TraesC-S5A02G166100, TraesCS5D02G170700, TraesCS1A02G2 20300) associated with reported flowering time genes (Vrn1-5 A, Vrn3-7B, Ppd-1D, WSOC1) may play a key role in the wheat heading date regulatory pathway. Further research is warranted to reveal their biological functions.

Conclusions
In this study, two key flower development stages were selected to conduct RNA-Seq analysis. Based on DEG analysis, many transcription factors co-expressed with key flowering time genes (Vrn1-5 A, Vrn3-7B, Ppd-1D and WSOC1) were identified by WGCNA. Subsequently, a potential flowering time regulatory gene, TraesCS2D02G181400, was discovered from the Vrn1-5 A network. The enriched transcriptome data at W2.0 and W3.5 will serve as a resource for elucidating the mechanism of wheat heading date regulation.

Plant materials
m605 is a late heading mutant that was identified from the ethyl methanesulfonate mutant library of YZ4110. Previous studies conducted in our laboratory identified m605 as a late heading mutant compared with its wild type parent YZ4110 [32]. To reveal the flowering time regulation model of genes for the heading date, homozygotes for the early heading date (WHd) and late heading date phenotype (MHd) (Fig. S15) derived from the progeny of BC 3 F 4 were selected to conduct RNA-SEq. The purpose of using the backcrossed segregating population was to reduce false positive problems caused by the genetic background. Seeds of each wheat accession were planted in flowerpots and grown under 16 h light/22℃ and 8 h dark/18℃ (a long-day environment) in a culture room. The plants were transferred to a 4℃ environment for vernalization treatment for 4 weeks, where they developed to the one-leaf stage and were then cultured in a greenhouse until they were sampled. A total of twentyfour wheat samples were selected, including the apex of the wild/mutant type at the W2.0/W3.

RNA extraction, library construction, sequencing and quality control
The sampling period included the double ridge stage (W2.0) and androgynous primordium differentiation stage (W3.5) (Fig. S1), and the sampling position included the latest unfolding leaf and growth cone of the main stem. We used clean, liquid nitrogen-frozen forceps to quickly clamp the growth cone and place it in an RNase-free centrifuge tube (the centrifuge tube was first filled with liquid nitrogen). Newly unfolded leaves of the main stem corresponding to the growth cone were mixed at the same time; the sampling position was 2 cm away from the tip of the leaf, and a cut was made 4 cm away from the tip of the leaf to obtain a tissue sample with a length of 2 cm. All samples were rapidly frozen in liquid nitrogen and stored in a -80°C environment. The TRIzol method was used to extract RNA from each sample; quality of RNA was detected by 1 % agarose gel electrophoresis, and the concentration of RNA was assessed using a Nanodrop 2000 instrument. High-quality RNA samples were sent to the ONMATH company (www. onmath.cn) to construct the sequencing library following Illumina's standard pipeline. The constructed sequence libraries were sequenced with the Illumina HiSeq™ 4000 platform for 150 bp paired sequencing, and raw data were saved as fastq files. To remove the adaptors and lower quality reads at the head and end, we used Trimmomatic (version 0.35) [33] to conduct quality control of sequencing data with the following parameters: java -jar trimmomatic − 0. 35

Gene expression analysis of differentially expressed genes
RNA-Seq analysis of the sequencing libraries was performed according to the method reported by Perter et al. [35]. (1) Hisat2 software (version 2.5.3a) was used to construct an index of the reference genome of hexaploid wheat Chines Spring (IWGSC RefSeq v1.1, http:// www.wheatgenome.org/Tools-and-Resources) with default parameters. (2) Clean reads were mapped to IWGS C RefSeq v1.1 by hisat2 with the parameters "hisat2 -x reference.genome.index -p 8 -X 400 --no-unal --dta − 1 input.R1.clean.fastq.gz -2 input.R2.clean.fastq.gz -S input.sam", and the mapping results of reads were stored in a bam file. (3) Stringtie software (version 1.3.3b) [36] was used to carry out transcript assembly of alignments with the following parameters: stringtie -e -p 16 -B -G reference.genome.gff3 input.bam -o input.gtf. (4) The R package Ballgown [35] was employed to calculate gene expression levels, and we used FPKM (fragments per kilobase of transcript per million mapped reads) to measure gene expression values at the whole-genome level. In addition, we used HTseq-count software (version 0.11.1) [37] to count the read number of each bam file with the following command line: htseq-count --format bam --order pos --mode union --stranded no --type exon input.bam reference.genome.gtf > reference.counts.txt. The count files of reads were used to calculate differentially expressed genes with the R package egdeR [38]. The evaluation criteria for differentially expressed genes were an absolute log2 (fold change) greater than 1 and a false discovery rate (FDR) less than 0.05 [39,40]. The distribution of DEGs along the three wheat subgenomes was determined by chromPlot software [41].

GO enrichment analysis of DEGs
D E G s w e r e s u b m i t t e d t o t h e a g r i g o ( h t t p : / / systemsbiology.cau.edu.cn/agriGOv2/index.php) analysis toolkit [42,43] for GO enrichment analysis. Annotation information provided by the IWGSC reference genome (IWGSC RefSeq v1.1) was used as a reference database. We used the singular enrichment analysis method through Fisher's exact and Bonferroni multiple tests to screen significant GO terms.

MapMan metabolic analysis of DEGs
The differential expression spectrum was mapped to the metabolic regulatory pathway in detail using MapMan (version 3.5.1) [44]. Considering that MapMan software lacks the mapping file from the Chinese Spring reference genome, we first extracted the coding sequence of DEGs and then uploaded this sequence to Mercator (http:// www.plabipd.de/portal/mercator-sequence-annotation) for gene annotation and to obtain the corresponding mapping file. Then, the mapping file containing the gene expression value was imported into MapMan to analyze metabolic regulation [45,46].

Transcription factor analysis of DEGs
To explore the transcription factor changes during wheat heading date development, we extracted the protein sequences of DEGs from IWGSC RefSeq v1.1 and used the iTAK [47] pipeline (http://bioinfo.bti.cornell. edu/tool/itak) for transcriptome factor prediction and functional classification.

Weighted gene co-expression network analysis and flowering time regulatory network construction
The R package WGCNA (version 1.70-3) [48,49] was utilized to construct the flowering time co-expression network and identify candidate flowering time genes. The weight value is a parameter for measuring the coexpression value, which correlates positively correlated with the degree of co-expression. First, the "goodSam-plesGenes" function was used to check the quality of the gene expression data, and the "pickSoftThreshold" function was applied to determine the soft power. Second, we calculated the "adjacency matrix" and "topological overlap matrix" according to the soft power value. Third, the initial gene expression matrix was obtained by hierarchical clustering of the dissimilarity matrix with the "hclust" function with the default parameters: minModu-leSize: 30, networkType: "signed", TOMType: "unsigned", mergeCutHeight: 0.25. Based on the connectivity between co-expressed genes, Cytoscape software (version 3.7.2) [50] was used to visualize the flowering-related gene regulation network.

Quantitative Real-Time PCR
We used Primer Premier 5 software to design genespecific primers based on gene sequence. Each experiment included three biological repeats. We selected GAPDH (GenBank: AF251217.1) as the internal reference gene to correct gene expression. The PCR system included a 20 µL reaction volume, which consisted of 10 µL of SYBR Mix, 2 µL of cDNA, 0.8 µL of forward and reverse primers, and 7.2 µL of ddH 2 O. RT-PCR amplification in the Roche LightCycler 480 Real-Time System (Roche, Switzerland) platform was performed. The PCR amplification steps were 95 ℃ for 60 s, followed by 40 cycles at 95℃ for 5 s, 60℃ for 30 s, and 95℃ for 15 s. The 2(-delta delta C(T)) method was used for the relative gene expression analysis of DEGs [51]. All primer sequences used in this study are shown in Table S7.