Illumina sequencing, assembly and unigene annotation
For RNA-seq, total RNA isolated from two biological replicates for each sample was subjected to cDNA library preparation to generate a broad survey of transcripts associated with the graft process in hickory. Raw Illumina sequencing reads were qualified and adapter-trimmed to yield a total of 83,676,860 clean short reads comprising 4.19 Gb of sequence data from all the complementary DNA libraries (Additional file 1). The sequence generated from Trinity software was used as the reference transcriptome [20]. Additionally, all the reads obtained from hickory at three grafting stages were assembled using Trinity, and then the low complexity and low-quality reads were filtered out, generating 160,638 transcripts (N50: 1,984) with a mean length of 1,088 bp. For each sample, about 93% reads were mapped to the reference transcriptome by RSEM software with default parameters [21]. Clustering resulted in 89,633 unigenes (N50: 1,092) with the mean length of 659 bp (Fig. 1a, b). We also statistically determined that there were 30,967 transcripts (10%) in the length range of 1,000 to 2,000 bp and 27,983 (9%) with length > 2,000 bp; while there were 8,821 unigenes (5%) in the range of 1,000–2,000 bp and 5,991 (3%) with length > 2,000 bp (Fig. 1c, d).
To functionally annotate the assembled hickory unigenes, we compared their sequences against various protein databases using BLASTX. In total, there were 37,084 unigenes (41.37%) annotated in the Nr database, 17,990 (20.07%) in the Nt database, 7,010 (7.82%) in the KEGG database, 25,438 (28.38%) in the SwissProt database, 25,582 (28.54%) in the PFAM database, 29,947 (33.41%) in the GO database, and 13,735 (15.32%) in the KOG database (Additional file 2). Based on these annotations, we identified a total of 41,603 (46.41%) unigenes annotated in at least one database, suggesting that a relatively large portion of hickory unigenes had no hits to any known proteins in the selected databases.
Classification of enriched GO and KEGG terms
We further assigned GO terms to hickory unigenes. A total of 29,947 unigenes (33.41%) could be assigned to at least one GO term, and detailed information on the classification of enriched GO is listed in Additional file 3. Within the biological process category, the most highly represented terms were ‘cellular process’ and ‘metabolic process’. Within the molecular function category, ‘catalytic activity’ and ‘binding’ were the two most abundant terms. The most enriched terms within the cellular component category were ‘Organelle’, ‘cell’ and ‘cell part’ (Additional file 3).
To further reveal the involvement of metabolic pathways in graft process, we predicted the KEGG pathways represented by all assembled unigenes. A total of 12,034 unigenes were predicted in 248 signaling and metabolic pathways, including pathways related to cellular process, environmental information processing, genetic information processing, metabolism, and organismal systems (Additional file 4). Interestingly, the most enriched KEGG pathways included those related to metabolic pathways, such as amino acid metabolism (1106 unigenes), carbohydrate metabolism (1721 unigenes), and energy metabolism (1461 unigenes) (Additional file 5).
Analysis of DEGs during the hickory graft process
RNA-Seq analysis of the gene expression during the graft process in hickory was conducted at time points 0, 7 and 14 d. RNA-Seq data were processed to calculate RPKM values, which are normalized indicators for comparing the transcript levels of each unigene between different samples. A total of 850 significantly DEGs were identified and analyzed using criteria of two-fold differences and padj < 0.05 (Fig. 2a). To show the major trends and the major transitional states during the graft process in hickory (0, 7, and 14 d), all 850 DEGs were assigned to 14 clusters by K-means method. Among these up-regulated gene clusters, clusters 1 and 14 showed a similar pattern of genes were up-regulated at different time points and reaching their peak levels at time point 14 D; The genes of clusters 2, 11, and 13 were also up-regulated during the graft process and reached their peak levels at time point 7 d. Clusters 3–6, 8, 10 and 12 were significantly down-regulated. The genes of clusters 7 and 9 were down-regulated at time point 7 d and increased at time point 14 d (Fig. 2b).
We then identified highly genes that were differentially expressed between any two of the three time points during the graft process. Based on the same criteria (two-fold and padj < 0.05), we identified 777 unigenes that were significantly differently expressed at time point 7 d compared with control (time point 0 d); of these, 324 unigenes were up-regulated and 453 unigenes down-regulated (Fig. 3a). This was a greater number than the differentially expressed unigenes between time points 14 d and 0 d, for which a total of 262 unigenes were identified, with 38 up-regulated and 224 down-regulated (Fig. 3b). A small number of differentially expressed unigenes were identified between time points 14 d and 7 d: nine unigenes up-regulated and 13 down-regulated (Fig. 3c). We compared three sets of transcriptome data from different time points. Interestingly, most of the differentially expressed unigenes in the comparison between 7 and 0 d were independent of those in the comparison between 14 d and 0 d. In detail, only 16 unigenes were up-regulated both in the comparisons between 7 d and 0 d, and 14 d and 0 d; while 187 unigenes were down-regulated in both comparisons (Fig. 3d, e).
Furthermore, to demonstrate useful information concerning the DEGs during the graft process, we analyzed the GO terms represented by these genes. In total, 11 enriched GO terms were identified within the DEGs between 7 d and 0 d; while 15 enriched GO terms were identified between 14 d and 0 d. GO term enrichment analysis indicated that genes involved in various biological processes and molecular functions such as response to fungus, defense response to fungus, terpene synthase activity, oxidoreductase activity and carbon-oxygen lyase activity were significantly enriched in DEGs both in both the 7 and 0 d and 14 d and 0 d comparisons (Fig. 3e, f).
PPI network of DEGs
To further investigate the biological processes involved in the grafting process of hickory, we analyzed the PPIs among the 850 identified DEGs. The PPI network for hickory graft process had 47 proteins as nodes, connected by a number of identified direct physical interactions obtained from the STRING database. A high quality image was constructed as an overview of the PPI network, with differentially enriched interaction groups indicated by different colors (Fig. 4 and Additional file 6).
It is noteworthy that 10 clusters were identified within the DEGs in the comparison between 7 and 0 d; however, only two clusters were identified within the DEGs in the comparison 14 d and 0 d. In detail, the largest cluster (cluster I) consisted of eight proteins related to the biosynthesis and degradation of lignin. The second largest cluster (cluster II) consisted of six proteins, which are molecular chaperone involved in R gene-mediated disease resistance. Five ribosomal proteins were grouped into the third largest cluster (cluster III). Two PPIs were identified from the DEGs in the comparison between 14 d and 0 d: cluster I (involved in lignin biosynthesis) and cluster II (transcriptional repressor involved in abiotic stress responses). Interestingly, the proteins associated with lignin biosynthesis and degradation showed significant differences in both comparisons between 7 and 0 d and between 14 d and 0 d.
Expression of auxin signaling pathway and cytokinin signaling pathway genes during the graft process
Expression of genes of the auxin and cytokinin signaling pathways were analyzed to reveal the involvements of these two important hormonal signaling pathways in the graft process in hickory [22]. Comparison of the transcript abundances of auxin transport, metabolism, signaling pathway, and downstream induced genes revealed a conserved response during the graft process (Fig. 5). The transcription level of most auxin transporter encoding genes was changed significantly during the graft process. For auxin efflux carriers, several unigenes (comp42647_c0, comp56686_c0 and comp59759_c0) were up-regulated at time point 14 d; while several other unigenes (comp282487_c0, comp194811_c0 and comp87897_c0) were down-regulated at time points 7 d and 14 d. For auxin influx carriers, most unigenes were induced and four unigenes (comp217649_c0, comp81392_c0, comp81435_c0 and comp91337_c0) were greatly down-regulated during the graft process. Only one TIR/AFB encoding gene was identified in the hickory transcriptome data, and its expression level was induced at time point 14 d. Three indole-3-acetic acid-amido synthetase genes were identified: comp41777_c0, comp65539_c0 and comp85186_c0. Interestingly, these three unigenes displayed highest expression level at the early stage (7 d), and then decreased at time point 14 d. Many unigene sequences were appraised as auxin response family genes (Aux/IAA and ARF families), and these genes showed a diversity of expression pattern during the graft process (Fig. 5 and Additional file 7).
Furthermore, genes related to cytokinin signaling pathway were also identified from the hickory transcriptome data. It is noteworthy that all 11 hickory AHK/CRE genes were up-regulated during the graft process. Expression of some AHK/CRE genes (comp113371_c0, comp75037_c0, comp48382_c0 and comp90336_c0) peaked at time point 7 d, and then declined slightly at time point 14 d. Two RR-A genes, comp69732_c0 and comp75680_c0, were reduced during the graft process; while another two RR-A genes, comp63651 _c0 and comp89423_c1, were induced at time point 7 d, and then dropped down to the base level (0 d). For RR-B genes, one unigene (comp212565_c0) was up-regulated at time point 7 d, another unigene (comp92083_c0) was up-regulated at time point 14 d, and the rest ones (comp48200_c0, comp87723_c0 and comp202598_c0) were down-regulated at time points both 7 d and 14 d (Fig. 6 and Additional file 8). Furthermore, the phylogenetic data of auxin signaling pathway and cytokinin signaling pathway genes was showed in Additional file 9.
QRT-PCR validation of the expression level of several unigenes from RNA-seq data
To verify the DEGs related to hormone signaling that were identified using RNA-Seq, we performed qRT-PCR assays with independently samples collected from graft unions during the different grafting stages (0, 7 and 14 d). We selected 20 unigenes from auxin- and cytokinin-signaling pathways, including two efflux carriers, two influx carriers, two Aux/IAAs, two GH3s, one ARF, five auxin induced proteins, two HK/CREs, two ARR-As and two ARR-Bs, to validate the RNA-Seq data. The expression levels of these selected genes were basically consistent with RNA-Seq results (Fig. 7). The primer sequences are listed in Additional file 10.