Reproductive developmental transcriptome analysis of Tripidium ravennae (Poaceae)

Background Tripidium ravennae is a cold-hardy, diploid species in the sugarcane complex (Poaceae subtribe Saccharinae) with considerable potential as a genetic resource for developing improved bioenergy and ornamental grasses. An improved understanding of the genetic regulation of reproductive processes (e.g., floral induction, inflorescence development, and seed development) will enable future applications of precision breeding and gene editing of floral and seed development. In particular, the ability to silence reproductive processes would allow for developing seedless forms of valuable but potentially invasive plants. The objective of this research was to characterize the gene expression environment of reproductive development in T. ravennae. Results During the early phases of inflorescence development, multiple key canonical floral integrators and pathways were identified. Annotations of type II subfamily of MADS-box transcription factors, in particular, were over-represented in the GO enrichment analyses and tests for differential expression (FDR p-value < 0.05). The differential expression of floral integrators observed in the early phases of inflorescence development diminished prior to inflorescence determinacy regulation. Differential expression analysis did not identify many unique genes at mid-inflorescence development stages, though typical biological processes involved in plant growth and development expressed abundantly. The increase in inflorescence determinacy regulatory elements and putative homeotic floral development unigenes at mid-inflorescence development coincided with the expression of multiple meiosis annotations and multicellular organism developmental processes. Analysis of seed development identified multiple unigenes involved in oxidative-reductive processes. Conclusion Reproduction in grasses is a dynamic system involving the sequential coordination of complex gene regulatory networks and developmental processes. This research identified differentially expressed transcripts associated with floral induction, inflorescence development, and seed development in T. ravennae. These results provide insights into the molecular regulation of reproductive development and provide a foundation for future investigations and analyses, including genome annotation, functional genomics characterization, gene family evolutionary studies, comparative genomics, and precision breeding. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07641-y.


Background
The need for and importance of alternative energy sources becomes increasingly essential as global energy demands grow with concomitant fossil fuel reserve depletion. Bioenergy crops suitable as fuel for heat, electric power generation, and processing into cellulosic ethanol continue to attract attention as alternative fuel sources. Members of the grass family Poaceae subtribe Saccharinae, also known as the sugarcane complex, have gained attention for their broad adaptability, pest resistance, high biomass yields, and potential for perennially sequestering large amounts of carbon with few inputs on marginal lands [1][2][3][4][5].
Conventional breeding is well suited for improving complex traits such as yield and cold-hardiness. Maren et al. [3] recently reported on new interspecific Tripidium hybrids with significantly higher biomass yields than Miscanthus ×giganteus and cold hardiness to USDA Zone 6b/7a. However, plant biotechnology has considerable potential to augment conventional breeding and make value-added improvements in elite clonally propagated cultivars without compromising other genetically complex and desirable traits. For example, silencing key reproductive processes could reduce reseeding and invasive potential of valuable bioenergy grass clones.
The role of plant reproduction in crop production and yield encouraged extensive research into the genetics of flowering among agricultural cereals such as wheat (Triticum spp.) [18][19][20] and barley (Hordeum vulgare) [21][22][23]. However, current information on the genetics and translational genomics of reproductive development in perennial grass species of the Panicoideae (including Tripidium spp.), Aristidoideae, Chloridoideae, Micrairoideae, Arundinoideae, and Danthonioideae (PACMAD) clade of the Poaceae is limited [24]. A foundation for future application of precision breeding and gene editing depends on a detailed understanding of the reproductive process' genetic regulation. Transcriptomic and RNA-sequencing analyses allow for examining gene expression regardless of prior sequencing context and enable the identification of candidate genes for modification [25,26]. The processes of floral initiation [27], inflorescence development [28,29], and seed development [30,31] involve multifaceted changes in gene expression. An extensive literature search identified minimal genetic information or gene expression analysis of T. ravennae. The aim of this study was to characterize the genetic control and differentially expressed transcripts in reproductive development pathways of the diploid perennial bioenergy grass T. ravennae.

Transcriptome assembly and functional annotation
Sequencing of vegetative, developing inflorescence, floret, and seed tissues yielded 687 million raw reads (Supplemental Table 1). The primary de novo assembly utilized 15.4 million paired and quality-trimmed reads and comprised 95% of all quality-filtered reads yielding an assembly with 156,724 contigs (N50: 1265; Table 1; Supplemental Fig. 1a-c). BUSCO analysis, utilizing the 956 Plantae core set, revealed an 85.6% completion rate ( Fig. 1). Alignment of the transcriptome contigs to the draft genome assembly of T. ravennae (Maren et al., in preparation) and cluster-based enrichment reduced the contig set by 68%, yielding a transcriptome assembly with 105,307 unitigs (N50: 1494; Table 1). Similaritybased clustering and genomic alignment reduced the transcript set by 33%, reducing the representation of complete conserved orthologs (BUSCO core genes) by 4.5%. The reduced transcript sets functional annotation identified 33,782 unigenes with at least one of the 130, 460 annotations (Supplemental Tables 8; 9; Figs. 2, 3 and 4). Across all samples, 41,234 unigenes were expressed greater than five transcripts per million (TPM) in at least two biological samples and two-fold change (absolute value of log 2 tagwise dispersion values) between two or more samples. Of those unigenes, 36,127 were transcribed and differentially expressed in at least one of the 78 pair-wise tests of differential expression.
Relative to vegetative controls, inflorescence samples at the earliest stages of development had the greatest abundance of DEGs, corresponding with the floral transition (Fig. 2). The fold change in gene expression varied considerably and ranged to greater than ±8000 times the expression levels observed in the vegetative meristem control samples. Table 2 enumerates a list of genes of interest with tissue-specific expression patterns, and those whose fold changes in gene expression were most pronounced for a given stage of inflorescence development. The genes listed in Table  2 comprise a shortlist of utilitarian targets for studying gene functional genomics and gene knockout mutations that might serve to limit reproduction. This information furthermore provides the opportunity for the identification of tissue-specific promoters.  Table 6). Among floret development samples, DEG abundance was highest in stamen tissues (4966), followed by anthesis florets (2622), pre-anthesis florets (1970), and early development florets (698). At the earliest stage of floret development, multiple MADS-box transcription factors, including AGL6like [41], were differentially regulated along with protein DROOPING LEAF (DL)-X1 (Trav0014582) [55] and AGlike (Trav0011776) [54]. As florets developed into preanthesis florets, differential gene expression analysis revealed multiple genes in floral organ morphogenesis, embryo sac, embryonic, and pollen development relative to late inflorescence development samples. DEGs implicated in floral development included guanine nucleotide-binding protein NUCLEOSTEMIN LIKE 1 (NSN1 -Trav0003163) [56,57], SOMATIC EMBRYO-GENESIS RECEPTOR KINASE 2 (SERK2 -Trav0003828) [58], FLOWERING PROMOTING FAC-TOR 1-LIKE 2 (FPF1)-2 (Trav0026700) [59], and BTB/ POZ AND TAZ DOMAIN-CONTAINING PROTEIN 3 (BT3 -Trav0015326) [60]. Relative to late inflorescence development samples, unigenes regulating pollen integument, cellular growth, and karyogamy increased in the DEG sets in pre-anthesis florets and stamen tissues, including protein NUCLEAR FUSION DEFECTIVE 4   Table 5). Clustering utilized transcript per million (TPM) normalized expression values to iteratively calculate pair-wise Manhattan distances between all clusters and joining clusters of proximity. Branch length represents the distance between clusters and reflects the similarity of expression profiles for co-expressed genes within the two clusters (NFD4)-X2 (Trav0000417) [61], and putative receptor protein CRINKLY 4 (CR4 -Trav0011689) [62]. Floral tissue samples can provide detailed information to identify genes of interest for future applications of precision breeding and gene editing of reproductive processes. The utility of full-length transcripts to key floral integrators and the potential they have in the identification of multiple gene-editing mutation regions may be a key asset in precision breeding and gene editing. DEG analysis revealed tissue-specific gene expression dynamics in all floral samples. Within flower specific tissues, stamen tissues yielded the greatest abundance of upregulated differential gene expression (Fig. 4). Gene expression dynamics varied widely in stamen samples, with some genes demonstrating a 7000-fold difference in expression relative to late stage inflorescence controls. Genes of interest were amassed into a table based on tissue-specific expression and large fold changes between controls and a given floral sample ( Table 3). The tissue expression specificity of these genes provides candidate genes for continued study or applied plant breeding purposes to limit reproduction in Tripidium.
Seed production is a significant contributor to the successful cultivation of many plants. In developing seed samples, a variety of genes with increased expression levels involved in homeostatic regulatory roles for metal ion uptake, redox processes, heat shock, and genes known to interact under abiotic stress were found in the dataset. Fold changes in gene expression between late inflorescence stage development controls and mature seeds varied considerably  Table 5). Clustering utilized transcript per million (TPM) normalized expression values to iteratively calculate pair-wise manhattan distances between all clusters and joining clusters of proximity. Branch length represents the distance between clusters and reflects the similarity of expression profiles for co-expressed genes within the two clusters and were expressed at as much as a 16,000-fold difference (Fig. 6). A shortlist of genes of interest within these analyses were amassed in Table 4 because of their tissue-specific expression pattern. Taken together, these genes represent several potential targets that may drastically limit plant reproductive characteristics in an applied precision breeding context.
Gene expression analysis and validation of selected candidate genes by real-time qPCR The expression patterns between RNA-Seq data and qPCR data demonstrated a positive mean Spearman rank correlation (Fig. 8). Overall, the six genes selected for the assessment validated similar relative rank and transcript abundances throughout development. CAFFEIC ACID 3-O-METHYLTRANSFERASE (COMT -Trav0002740) is involved in the lignin biosynthetic pathway [38]. In Tripidium, unigenes designated with this annotation expressed abundantly in vegetative meristematic tissues and declined throughout inflorescence, floral, and seed development. NUCLEAR FUSION DEFECTIVE 4 (NFD4 -Trav0032568) is a gene of interest for its role in karyogamy or nuclear fusion following pollination and during female gametophyte formation [61]. Overall transcript abundance was low and increased during the critical phases of floral development. PARTING DANCERS 1 (PTD1 -Trav0016008) is essential in cross-over formation during meiosis [44]. Reports of transcriptional activity of PTD1 in rice place the highest activity in anthers and pistils at heading [44]. The notable difference in the number of possible meiocytes sampled in developing inflorescences versus those of floral development tissues likely explains some of the disparity in expression patterns between Tripidium and rice. CTF7 functions in the plant body during double-stranded break repair associated with cell division. The expression dynamics of this DNA repair associated protein expressed

Discussion
This study aimed to scrutinize the gene regulatory environment of developing inflorescences, flowers, and seeds to maximize the knowledge base for transgenic and gene editing based plant improvements. Our approach examined both within and between-group testing to identify transcript abundance changes specific to a given stage of development.

Inflorescence development and floral transition
The abundance of differentially expressed unigenes revealed the magnitude of gene expression changes involved in floral induction. The transition from the vegetative meristem to the reproductive meristem follows an abundance of regulatory changes coinciding with the increased expression of homeobox genes, MADS-box transcription factors, and plant hormones [28]. The members of the expansin family were significantly overrepresented in the hypergeometric annotation tests as well as the differential expression analysis, which identified nine different transcript annotations of expansins as well as other canonical floral induction integrators. As culms produced new vertical growth, the upregulation of multiple expansin family members validated their role in the reproductive process of Tripidium [71][72][73]. Many of the classic ABC(D) E model MADS-box transcription factors are essential in the fate of meristem identity and determinacy [41,74,75]. Within the developing inflorescence, 21 different MADS-box elements demonstrated differential expression patterns. The expression dynamics and identification of MADS18, SbMADS22-X2, MADS-box protein SOC1, and MADS1 4-X2 [36,40] in inflorescence meristem samples demonstrate the conservation of roles of these floral organ identity transcription factors in Tripidium. FLOWERING LOCUS T (FT), HEADING DATE 3A (HD3A) [76], and VERNALIZATION 3 (VRN3) [77] are synonyms for florigen, now well characterized in Arabidopsis, rice, and wheat [21,53,78]. The most significant sequence homology of the Tripidium florigen transcript identified FT (TrFT -Trav0056419) within the transcriptome. Within inflorescence development samples, TrFT demonstrated the highest expression levels in the floral transition (20P), as reported in Sorghum bicolor [53]. As an inflorescence matures into the late stages of development, the determinacy of the meristems changes in several plant species. SELF-PRUNING (SP) is the homolog of CENT RORADIALIS and TERMINAL FLOWER 1 of Arabidopsis and functionally maintains floral meristem indeterminacy [48]. The SP expression patterning in Tripidium followed expectations with consistent expression throughout inflorescence development. The floral homeotic protein AG-like genes increased in expression in Tripidium inflorescences throughout maturation. AGlike genes are broadly involved in floral development and coordination of the floral body plan; therefore, it might have a role in Tripidium floral transition and organization. Concomitant decreases in WUSCHEL-  related homeobox expression were observed as AGL increased, capitulating the relationship between these two interacting genes in Tripidium [79]. In rice and barley, their respective CONSTANS-like 9 (COL9) and CONSTANS1 (CO1) expression increases in the early weeks of plant development and diminished following the floral transition [21,35]. As COL9 abundance represses EARLY HEADING DATE 1, delaying flowering in rice, COL9 (Trav0008717) expression diminished in Tripidium with concomitant increased HEADING DATE 3 (HD3)-like transcription from early to mid-inflorescence stages of development [21,80,81]. Grain number, plant height, and heading date 7 (Ghd7) is a photoperiodic responsive gene with upstream repressive effects on EHD1 and subsequently HD3A in rice [76]. The translational dynamics of Ghd7 in Tripidium support a role throughout inflorescence development [23,42,43,76].

Floral and reproductive development
The identity genetics of floral development in grasses consists of several MADS-box transcription factors and their interacting proteins in articulating the floral body plan. The translation of floral homeotic class genes from the core eudicots to the monocots is tenuous due to various synonyms applied with computationally derived annotations. The road map best suited to this disambiguation lies in the consensus between expression dynamics and functional genomics. The AGL6-like MADS-box transcription factor is active in perianth and gynoecium development in rice with pleiotropic effects on lodicules, stamens, and carpels [82]. In Tripidium, the expression of AGL6-like follows similar patterns of expression in developing inflorescences and floral tissues reported in Oryza sativa [41], supporting its floral organ-specific expression and the significance of its role in floral organ specification. The role of floral organ specification and meristem determinacy are also functions of the YABBY genes DL1 and DL2 in rice carpel specification and maize [55,83,84]. While the tissue expression specificity between branch meristems and inflorescence meristems was not established in these experiments, the dynamic gene expression profile in inflorescence and developing floral tissues support its role in Tripidium floral determinism [55,83,84]. In wheat, the TaAGL6 acts on staminate floral development by working on TaAPE-TALA3 [85]. In Tripidium, the homeotic activity of staminate floral development could be identified by the dramatic increase in transcript abundance of floral homeotic protein AG-like in samples at 80 cm of growth.

Meiosis, embryogenesis, and caryopsis development
As inflorescences matured into the middle stages of development and beyond, transcript abundance increased.
Among the changes in transcript abundance, multiple meiotic genes delineated the next phase in reproduction. Transcription factor EAT1 is a constituent of the TAPETUM DEGENERATION RETARDATION (TDR) heterodimer significant in tapetum development [86,87]. Interestingly, the expression pattern of EAT1 follows a distribution significant in both early and post-meiosis cells [50]. This expression pattern, and our observations in Tripidium, support mid-inflorescence stage microsporogenesis. The conformity of expression patterns of the POD1 transcript to those reported by Dai et al. [46] furthermore provides evidence for the conservation of function in Tripidium. It is noteworthy that the sampling strategy employed in this study was sufficient to successfully identify unigenes of protein PTD, CTF7, and SHORTAGE IN CHIASMATA I. Differential expression analysis of meiotic unigenes in developing inflorescences highlights their significance in the reproductive process as well as the abundance of meiocytes captured here. A more comprehensive characterization of the regulatory dynamics of meiocytes involved in megaand microsporogenesis will require additional examination in Tripidium [44,45,88].
The translational dynamics of floral development is foundational to plant reproduction. When the inflorescence has emerged from the flag leaf, the changes in floral morphology are apparent, but many of the morphological changes are manifestations of the developmental design laid out during inflorescence development. Many of the genes responsible for gamete production and flower development were highly expressed at the mid-inflorescence stages of development. One of the primary benefits of the sampling strategy and data analysis approaches used in this study has been to sample multiple stages of development and to compare all possible pair-wise comparisons. This has enabled the capitulation for several canonical floral genes in a novel species as well as highlighted the sequence identity of multiple novel genes.

Conclusion
Tripidium is an enigmatic plant with considerable potential as a landscape and bioenergy crop given its ornamental merit and high biomass production on marginal lands with minimal inputs. A greater understanding of gene expression throughout reproductive development provides context for gene function analysis in these basic biological processes. This research focused on the molecular genetics of plant reproduction in T. ravennae, including identifying diverse genes related to inflorescence, flower, and seed development. The expression dynamics of unigenes detailed in this study provide a guide for future biological studies on functional and comparative genomics and the development of biotechnology applications, including precision breeding.

Plant material and sample collection
Vegetative and inflorescence meristems as well as floral development tissues were collected from three plants on the week of August 9th, 2017, from the North Carolina Arboretum, Asheville, NC. Inflorescence meristematic tissue from reproductive culms was collected at various heights from the ground, representing floral development progression. Inflorescences had emerged from more developed culms, but inflorescence meristems at earlier development stages were apparent within the leaf sheath of immature culms. Floral development tissues were collected at the floret boot stage, pre-anthesis, anthesis stages, and mature stamens. Spikelets containing immature and mature seeds were collected from the same plants in September and October of 2017. Culm segments, inflorescences, and developing seed samples containing target tissues were collected and immediately placed in 15 or 50 mL centrifuge tubes vials with 5-20 mL of RNAlater® Stabilization Solution (Ambion®, Life Technologies TM). Centrifuge tubes containing sample tissue were frozen in the field on a dry ice bed before transport to the laboratory and stored at − 80°C (Table 5). Excess plant tissue was trimmed and removed or enriched under a stereomicroscope in sterile 100 mm Petri dishes containing approximately~5-10 mL fresh RNAlater®. Immature floret tissue samples (FT) were purified from bulk collected inflorescence tissue before emergence from the flag leaf sheath. Pre-anthesis floret tissue samples (PAF) were purified from bulk collected inflorescence tissue that had emerged from the flag leaf sheath. The observation of spikelet expansion and glume extrusion identified PAF samples, but no evidence of anther or stigma extrusion was observed. Stamen tissue samples (ST) were purified from bulk collected floral spikelet tissues when anthers were exposed entirely outside of the glumes, and pollen was visibly dehiscing. Samples of florets at anthesis (ANT) were purified from bulk collected floral spikelet tissues by amassing florets showing stigma protrusion from the floret's lemma and palea. Immature and mature seeds were processed by removing first, and second-order rachilla from the bulk collected tissue before tissue lysis and homogenization. Sample tissue lysis and homogenization were processed in liquid nitrogen by mortar and pestle.

RNA isolation, library preparation, and sequencing
Total RNA was extracted from all tissues using the Spectrum® Plant Total RNA Kit (Sigma-Aldrich, Burlington MA). DNA was digested on-column with the Sigma-Aldrich DNase10 (DNASE10) kit per the manufacturer's instructions. RNA concentration and integrity were quantitated with the QubitTM fluorimeter (Life TechnologiesTM) and the 2100 Bioanalyzer (Agilent) before library preparation, respectively. RNA samples were poly-adenylation purified, and cDNA libraries were prepared using the BiooScientific (a PerkinElmer Co.) NEXTFlex Rapid Directional RNA-Seq kit with a target insert size of 200-300 bp. Libraries were sequenced using the Illumina HiSeq 4000, 150 bp PE by Novogene (Sacramento, CA). The RNA of all samples was mixed and used to construct Pacific Biosciences Iso-Seq libraries (Protocol # 101-070-200 version 6) with three size fractions (no size selected, < 4 kb, and > 4 kb). The libraries were sequenced with four cells of a PacBio Sequel I system at NC State Genomic Sciences Laboratory.

Transcriptome assembly and functional annotation
Read quality was inspected for quality with FastQC. Trimming was conducted with CLC Genomics WB (CLC -GWB, V11.0.1, QIAGEN) to remove adapter sequences and low-quality reads (Q < 20). Multiple de novo transcriptome assemblies were constructed with the CLC -GWB using different k-mer (word size) and the bubble size of de Bruijn graph combinations and assessed for the number of contiguity and N50 (Supplemental Fig. 1a). The assembly with the lowest number of contigs but the largest N50 was selected which maximized the yield of complete BUSCOs (Supplemental Fig. 1b & c). The final assembly was mapped (GMAP, V2015-07-23, [89]) to a draft genome assembly of T. ravennae (Maren et al., Unpub.) as well as multiple reference genomic assemblies within the Andropogoneae tribe, including Sorghum bicolor [46], Saccharum officinarum [90], and Zea mays [91]. The GMAP mapping was carried out to enrich the transcriptome for plant transcripts and eliminate the transcripts of sample surface contaminants [92]. Contigs with a 95% identity and match score to two or more reference genomes were retained. The transcriptome was analysed for redundancy with CD-HIT software with 95% identity to make a nonredundant set [92,93]. Functional annotation was carried out on a local server using BLASTx and the nr (NCBI non-redundant protein 12/2018 version) database. Searches were limited to the first 20 significant results with an E-cutoff value of 1.0E-6. Unitigs were functionally annotated utilizing default annotation rules in the BLAST2GO package [94]. The unitigs and their BLASTX results were imported into the BLAST2GO package for functional gene annotation [94]. Gene ontology (GO) term and functional annotation assignments followed InterPro scan, using the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) database, KEGG pathway analysis [95], Rfam annotation [96], and GO mapping based characterizations online on the BLAS-T2GO package [97].
Differentially expressed gene (DEG) and gene ontology (GO) enrichment analysis Quality trimmed and filtered reads from all samples were mapped to the final transcriptome assembly with default parameters in the CLC -GWB. Statistical tests for the determination of differential gene expression utilize an exact test-like generalized linear model (GLM) similar to that performed in DESeq and EdgeR [98,99]. In developing inflorescences, the test of differential expression utilized non-flowering controls for comparison. The statistical tests for differential expression in developing floral spikelets utilized two or more pair-wise comparisons between developing inflorescences and the sample query (e.g., FT, PAF, ST, ANT). Two or more pair-wise statistical tests between inflorescence controls, floral samples, and developing seeds comprise DEG calls. Genes of interest were filtered from differentially expressed genes table in CLC-GWB using a threshold p ≤ 0.05 and a two-fold threshold change. Venn diagrams were generated from gene lists using InteractiVenn [100]. TPM normalized expression values for DEG's presented in each Venn diagram (Figs. 2, 4, and 6) were subjected coexpression analysis utilizing the R package "MBCluster.Seq" (version 1.0) [101]. Negative binomial modeling parameters were strategically adjusted to evaluate the number of coexpression clusters. Probability estimates greater than 0.9 for each member of a pair of genes coexpressing within the same cluster, as the parameters were adjusted, were used in the progressive selection of the clustering strategy as well as their gene memberships ( Supplementary Fig. 5). The final clustering utilized a reduced set of expression data containing the listing of genes from the most probable coexpressing gene set identified within the preliminary analyses. Individual clusters were analyzed independently in CLC-GWB with the hierachical clustering algorithm in the development of the heat maps presented in Figs. 3, 5, and 7.

Gene expression analysis by RT-qPCR
Bioinformatically derived differential expression statistics of inflorescence, floret, and seed development were screened for novel and putative genes in reproductive development to validate the sample set with RT-qPCR according to Zhao et al. [102]. GO enrichment by hypergeometric test (p ≤ 0.05; CLC-GWB) aided in selecting sequences from the test set for over-representation (Supplementary Table 2, 3, & 4). Unigenes were filtered for significant differential expression (FDR p ≤ 0.05) within the subset of relevant inflorescence samples. Final transcript selections were made on the unique mapping (GMAP; V2015-07-23, [89]) of the transcript to the reference genome assembly with concomitant support for gene architecture from the PacBio Iso-Seq data set. Primers were designed for each gene to maximize coverage for gene structures, which uniquely identified the isoform of interest. Multiple internal controls were selected from the RNA-seq data set by filtering the expression data set for unigenes with a minimum expression value of 200 transcripts per million (TPM), a mean value of less than 2000 TPM, and having a CV less than 0.35 [103]. Relative gene expression analysis was used in the evaluation of PCR data in the determination of gene expression values and calculated following the 2 -ΔΔCt method [104].