Gene expression profiling of the host-pathogen interface
To characterise gene expression profiles in wheat and PST during infection, we performed an RNA-seq time-course. We inoculated a highly susceptible wheat variety (Vuka) with PST strain 87/66 and harvested leaf samples at 0, 1, 2, 3, 5, 7, 9, and 11 days post-inoculation (dpi). Germinating PST spores were also collected as a control. For each time point, three biological replicates were used to generate a total of 27 poly(A) enriched cDNA libraries, which were sequenced on the Illumina HiSeq 2000 platform. Following quality filtering and data trimming, high-quality reads were aligned to both the wheat and PST-130 reference genomes [17, 18]. The percentage of reads that aligned to the wheat reference decreased from a maximum of 77.35 % (S.D. ±2.02 %) at 0 dpi to 34.37 % (S.D. ±1.30 %) at 11 dpi (Fig. 1a; Additional file 1: Table S1). Less than 1 % of reads mapped to the PST-130 reference genome at 1, 2, and 3 dpi, similar to the results observed in the uninoculated plant control (Additional file 1: Table S1). Therefore, these time points were not included in downstream analysis of the pathogen. At later time points, the proportion of reads aligning to the PST-130 reference increased from 1.02 % (S.D. ±0.55 %) at 5 dpi to 38.80 % at 11 dpi (S.D. ±2.72 %; Fig. 1a).
Improving the PST gene models
When using the previously published PST-130 gene models [11, 17] we found that a high percentage of reads (27 ± 19 %) that mapped to the PST-130 genome did not align to predicted exons (Additional file 1: Table S2). Therefore, we used our transcriptome data to generate an updated set of transcript annotations using the software Cufflinks [19] and the reference annotation based transcript (RABT) assembly pipeline [20], which generated a minimal set of predicted transcripts that best explained the observed spliced RNA-seq alignments. This significantly reduced the number of reads mapping to intergenic regions (0.18 ± 0.09 %; Additional file 1: Table S3). RNA-seq alignments with short intergenic lengths indicate the presence of overlapping genes incorrectly characterised as distinct loci [21]. In accordance, our updated PST transcripts have intergenic regions that are 3.5 times longer than those in the original gene models and consist of multiple domains that were previously defined as separate genes (Additional file 1: Table S4).
Coding and untranslated regions (UTRs) were then identified in the new set of PST transcripts using TransDecoder and a predicted proteome was generated [22]. We identified a total of 9,675 distinct genomic loci that encoded 17,582 expressed transcripts with significant ORFs. This new proteome was then annotated using the EBI Interproscan tool [23]. This approach led to the annotation of 7,290 out of the 9,675 putative protein-coding genes (Additional file 2).
Identifying wheat transcripts expressed during infection
For the wheat host, the proteome was defined from a set of 123,532 previously identified gene models [24] and Interproscan annotated a total of 88,951 genes [23]. Predicted proteins were assigned to orthologous groups in the KEGG database using the GhostKoala mapping tool [25]. A total of 31.6 % of host proteins were assigned, with 72.1 % of these showing similarity to proteins from monocots (Additional file 1: Table S5).
We observed a drop in the percentage of reads mapping to the wheat reference genome specifically at 3 dpi (Fig. 1a; Additional file 1: Table S1). As the wheat genome is currently incomplete, we examined the unmapped reads to determine whether this drop was due to the expression of transcripts currently not represented in the wheat genome assembly. We undertook a de novo assembly of the unmapped reads and used sequence similarity searches against the National Center for Biotechnology Information (NCBI) non-redundant (nr) protein database to annotate the newly assembled transcripts. Of the 2,019,326 total transcripts generated, 1,006,674 (49.85 %) could be annotated using this method. Among these BLAST-annotated transcripts, we selected transcripts for which hits matched a plant-related protein (871,367 sequences), including sequences from 387 different species with 59.69 % being monocots. To avoid redundancy, we removed ambiguous sequences using the CD-HIT-EST programme [26] and combined the wheat genome with these 657,021 new, non-redundant transcripts. Aligning our RNA-seq data to the combined wheat reference removed the decrease at 3 dpi in the percentage of reads that mapped to the genome (Fig. 1b; Additional file 1: Table S1).
Comparison of RNA-seq quantification methods
The next step was to quantify the expression of PST and wheat transcripts during infection. Properly accounting for the sampling process and inherent biases in RNA-seq approaches requires sophisticated statistical inference techniques [19]. Raw read counts or simplistic normalization such as counts per million (CPM) mapped reads are insufficient, particularly when considering alternative splicing and reads that map to multiple locations. To evaluate the performance of these statistical inference techniques and select the most appropriate method for our data, we first generated two test datasets that consisted of triplets of homoeologous genes from each of the A, B, and D genomes. These datasets included 4,307 triplets mined from the Ensembl Plants Triticum aestivum portal, and a subset of 239 triplets identified as core eukaryotic genes (Additional file 1: Tables S6 and S7). As a metric for comparison of the methods, we considered both the mean pairwise cosine similarity, which measures the similarity in shape of the temporal expression pattern independent of the magnitude, and the mean pairwise Euclidean distance between sets of homoeoloci, which depends on the magnitude of expression. Although recent results have suggested that sets of homoeologues have significantly biased expression levels between genomes in hexaploid bread wheat [27], we hypothesise that the normalised temporal expression profiles of homoeologous genes should be comparable (cosine similarity ≈ 1), particularly for triplets of core eukaryotic genes. Furthermore, similar profiles of expression have been reported for the Rht-A1, Rht-B1, and Rht-D1 homoeologous dwarfing genes in tissues of different regions of the developing wheat stem [28] and in wheat homoeologues of the defence-related WRKY transcription factors [29].
We selected the programs Cufflinks [19], RSEM [30], Salmon [31], and Kallisto [32] for comparison, with the first two as examples of widely used programs and the latter two being newly developed ultra-fast algorithms. Cufflinks gave the overall highest similarity (0.996 ± 0.022, 92.8 % > 0.99) between homoeologues for both datasets (Fig. 2). By contrast, RSEM, Salmon, and Kallisto consistently gave lower levels of similarity (0.978 ± 0.026, 0.927 ± 0.141, and 0.942 ± 0.144 respectively). Strikingly, the quantification methods produced contradictory results when tested on individual genes. By defining the relative difference between genes as the magnitude of their difference divided by their mean [32], we found that the average of the pairwise median relative differences for the same gene between the different programs was 1.08 and the mean correlation of the expression vectors was 0.72 (Additional file 1: Table S8). This result is consistent with a previous study, which reported that orthologous genes between nematode species with cosine similarities > 0.95 had matching expression profiles during development [33]. Based on this analysis, we decided to use Cufflinks to determine the expression profiles of PST and wheat in all downstream analyses.
Dynamic progression of PST infection in wheat
To understand the modulation of biological processes and pathways throughout the infection process, we investigated the gene expression profiles for the host and the pathogen. First, following the Cufflinks pipeline, cDNA libraries were normalized to generate transcripts per million (TPM) expression values and the significance of differential expression was tested using the Cuffdiff companion software with the 0 dpi or PST germinating spores used as controls in the analysis. A total of 64,618 host genes and 4,855 pathogen genes were identified as differentially expressed (FDR < 0.05) between at least one pair of time points (Additional file 1: Tables S9-11). TPM expression data for significantly differentially expressed genes were normalized and clustered into sets of genes with qualitatively similar expression profiles using the mini batch k-means algorithm [34], resulting in seven clusters for the host and eight for the pathogen (Additional files 3 and 4).
To elucidate the biological function for each cluster, manually curated groups of related annotation accessions, GO term annotations, and KEGG pathway memberships were tested for significant enrichment in each cluster relative to the entire proteome (Fig. 3; Additional file 1: Tables S12-15). For the wheat host, we identified (i) Cluster VII, which peaked in expression at 1 dpi during initial penetration and was enriched for genes annotated as peptidase inhibitors, glycosyl hydrolases, and peroxidases, (ii) Cluster V, which peaked in expression at 3 dpi during haustorium proliferation and was enriched for genes annotated as part of Photosystem II, and genes coding for cytochromes, ATP synthases, and RNA polymerases, and (iii) Cluster III, which peaked in expression at 11 dpi during sporulation and was enriched for genes involved in membrane transport and genes for ABC transporters and chitinases. The biosynthetic and downstream response pathways for the plant stress-induced hormones salicylic acid (SA), jasmonic acid (JA), ethylene (ET) and abscisic acid (ABA) were highly represented in Clusters I, III, and VII, which all had peaks of expression at 1 and 11 dpi. MAPK signalling was enriched in Clusters I, IV, and VI, and Ca2+ signalling and apoptosis were enriched in Cluster I.
For the pathogen (Fig. 3b) we also identified several clusters. Cluster I, which contained genes that peaked in expression at 11 dpi, was enriched for genes involved in fatty acid synthesis GO terms and transmembrane proteins. Cluster III, which peaked in expression at 7, 9, and 11 dpi, was enriched for genes encoding catalase enzymes and oxidoreductase GO terms. Cluster II, which was upregulated from 7 dpi onwards, was enriched for carbohydrate catabolism, including GO terms related to glucan, glycogen, and polysaccharides. Cluster IV, which peaked in expression at 7 dpi, was enriched in genes related to nucleic acid metabolism, ubiquitination processes, and peptidase activity GO terms. Cluster IV was also enriched in histone transcripts. Cluster V peaked in expression at 11 dpi and was enriched in putative transcription factors containing the Zn(II)2Cys6 (Zn2C6) domain, which has only been identified in fungal proteins to date [35]. Cluster VI peaked in expression at 5 dpi and was enriched in HSP20 proteins, which are induced during the development of infection in other fungal organisms [36]. Clusters III, V, VII, and VIII were also enriched in transcripts for proteins that contained a secretion signal but were annotated with no particular GO term (Additional file 1: Table S15).
Suppression of expression of host defence genes by PST is alleviated in a resistant host
On average 25 % (S.D. ±4.07 %) of the reads at each time point did not align to the wheat or PST reference genomes (Additional file 1: Table S1). Therefore, we investigated the de novo assembled transcripts from these unmapped reads by annotating their potential biological functions. We focused on identifying transcripts involved in the defence response, as the modular nature of immune receptors may have limited their assembly in the current wheat genome. We annotated the assembled transcripts that likely encode nucleotide-binding domain leucine rich repeat proteins (NLRs) using the NLR-parser tool [37]. We supplemented this set of NLR-encoding genes with additional genes that encode proteins with similarity to known or predicted disease resistance proteins, as identified through BLAST searches, and combined these two datasets (Fig. 4a). Through this analysis, we revealed a peak in the number of defence-related genes expressed at 2 dpi in the susceptible host, when compared to other time points. This peak in expression of defence-related genes at 2 dpi dropped sharply by 3 dpi; we hypothesize that this could be due to active suppression of the expression of these host genes by PST in the susceptible host.
To test this hypothesis, we generated a second RNA-seq time-course by infecting a wheat variety resistant to the PST 87/66 strain. For the resistant variety, we selected an Avocet introgression line containing the resistance gene Yr5 and harvested leaf samples at 0, 1, 2, 3, and 5 dpi. For each time point, three biological replicates were used to generate a total of 15 poly(A)-enriched cDNA libraries, which were again sequenced on the Illumina HiSeq 2000 platform. Following quality filtering and data trimming, high-quality reads were aligned to both the wheat and PST-130 reference genomes [17, 18] (Additional file 1: Table S16). We carried out de novo assembly of the unmapped reads and annotated the assembled transcripts using the NLR-parser tool [37] and similarity searches as above. When we assessed the expression of host defence-related genes during the resistant interaction, we determined that the number of expressed genes increased steadily throughout the time-course, without the suppression at 3 dpi that we observed in the susceptible host (Fig. 4b). This is consistent with the hypothesis that the pathogen suppresses defence-related gene expression in a compatible interaction to enable successful colonization.
Wheat homologs of the rice defensome complex show coordinate expression
To further explore the regulatory networks involved in the plant innate defence response, we integrated transcriptomic data with sequence similarity and protein functional domain searches to identify likely orthologs of interactors and complex partners of OsRac1, a central regulator of defence responses in rice (Oryza sativa). OsRac1 is a highly connected core component of the innate immune response, connecting with chitin perception though OsCERK1/OsCEBiP, reactive oxygen species generation through Rboh, phosphate signalling through MAPK6, and hormone signalling through RACK1 [38]. Of the ten genes we identified in Fig. 5, at least five have already been cloned in wheat and the interactions verified in wheat, rice, or barley [39–44]. We found that the expression dynamics of all the genes in the predicted defensome were significantly correlated compared to Monte Carlo simulations drawn from the null model of uniformly distributed gene vectors, strongly suggesting that they are functionally linked (Additional file 5). We were unable to confidently identify homologs of two other defence-related OsRac1-interacting proteins, OsCCR1 and OsMT2, which are involved in cell wall lignification and H2O2 scavenging, respectively [45, 46].
We identified two groups (one on chromosome 5 and one on chromosome 3) of three homologues of Rboh, the NADPH oxidase required for immune-related accumulation of reactive oxygen species (ROS); each group contains two genes from the B genome and one from the D or A genome (5D, 5B, 5B and 3A, 3B, 3B). For both groups, one of the B genes had lower sequence similarity to other group members (average 45 and 69 %) compared with the similarity observed between the other genes (88 and 95 %). The group from chromosome 5 was strongly induced at 1 dpi and the group from chromosome 3 was strongly induced at 2 dpi (Fig. 5).
TaCERK1 and TaCEBiP (Ta for T. aestivum) encode components of the chitin perception system and were strongly induced at 1 dpi. Also, the expression of genes for the other downstream proteins peaked at 2 dpi. This was followed by a sharp decrease in expression of many components that then steadily increased in expression over the time course, with the exception of TaRac1, which returned to its basal level from 3 dpi onwards. TaRac1, despite being a central regulator of immunity, was expressed at low levels (max 0.49 TPM). Of the three Hsp90 variants, Hsp90.3 had the highest expression (max 40.3 TPM), then Hsp90.2 (max 26.2 TPM), and finally Hsp90.1 had the lowest expression level (max 1.41 TPM), which agrees with other studies concluding that Hsp90.1 is less involved in disease resistance to the yellow rust fungus compared with the other Hsp90 genes [39].
The apparent rapid suppression in expression of genes involved in chitin perception (at 2 dpi) and the defensome activation (at 3 dpi) was similar to the NLR suppression noted above. This prompted us to investigate the defensome further in the wheat variety resistant to PST 87/66. Overall, we found higher expression of many components, including the receptor genes TaCERK1 and TaCEBiP, HOP, genes for the SGT1/RAR1/HSP90 complex, and TaRAC1 and TaRACK1 (Fig. 5; Additional file 1: Table S17). For the downstream gene TaRboh, the homologs from chromosome 5 and one homolog from chromosome 3 were strongly induced at 1 dpi, whereas the other 2 homologs from chromosome 3 peaked at day 3. In addition, in the resistant host, MPK6 continued to rise above basal levels after recovering from a small dip in expression at 2 dpi, whereas in the susceptible host it failed to recover from this suppression and only marginally increased in expression until 11 dpi Fig. 5). Furthermore, although an initial suppression of expression levels was observed, in particular for TaCERK1 and TaCEBiP at 2 dpi, this was rapidly alleviated by 3 dpi in the resistant host, but this alleviation did not occur in the susceptible host.
Expression of PST genes related to vesicle trafficking increases during germination and later during pathogen proliferation
Transcriptional responses in the pathogen also showed changes in gene induction over time. For instance, we identified homologs of genes for fundamental vesiculotubular carrier components that are central to membrane trafficking and cargo delivery, including SNARE proteins, GTPases, and clathrins (Fig. 6). These components function in all five stages of vesicle trafficking: sorting, uncoating, motility, tethering, and fusion (Fig. 6a). Once we identified the sequences of these components, we combined homologs with similar expression profiles to create a minimal representative set of each with at least one representative gene. The genes were then grouped depending on their expression profile. This revealed two separate modes of expression, namely during the germination stage at 0 dpi and during proliferation of the pathogen at 7 dpi onwards (Fig. 6b).
Homologs of all vesicle trafficking components followed the two modes of expression, with the exception of sec, which was unique to the late expression mode. Rab GTPase proteins cycle between activation, inactivation, and cytosol-membrane translocation, regulating all five stages in vesicle trafficking [47]. Their importance is highlighted by their consistently high expression during the active periods of both modes. The exocyst, which includes several Sec proteins, is also involved in targeting vesicles to the receptor membrane [47]. sec transcripts were the most abundant component at all stages with a mean expression of 662 TPM and a peak at 7 dpi of 1074 TPM (Additional file 1: Table S19). On average, the constituents shared between both modes were more highly expressed in the late mode due to a combination of more homologs per gene and higher average expression per homolog, particularly arf and clathrin with 15.5 and 6.5 times higher expression in the late mode compared to the early mode (Additional file 1: Table S19). The identification of genes for a clathrin-coated vesicle trafficking mechanism as highly expressed during the germination stage (at 0 dpi) and during proliferation of the pathogen (at 7 dpi) indicates that this trafficking likely plays a key role during PST nutrient acquisition early in infection and during effector delivery at later stages during infection.