In this study we analyzed the transcriptome of F. graminearum grown in liquid CM medium by Illumina sequencing to investigate the correctness of predicted gene models present in the annotated Broad F. graminearum genome database and to identify the occurrence of alternative splicing, RNA editing, non-canonical splice sites, novel transcripts and the sequences of the 5′ UTR and 3′ UTR regions. The total coverage of reads along the genes was evenly distributed except for the ultimate 5′ and 3′ ends indicating that overall our RNA-Seq data are of high quality
. Although overall the read coverage was evenly distributed over genes, for individual genes the coverage was not evenly distributed; this phenomenon has also been reported in other RNA-Seq studies
[32, 33]. Interestingly, for nearly all genes the read coverage pattern between wt PH-1 and mutant ebr1 is very similar. This suggests that each gene has a characteristic RNA-Seq profile, which could be related to secondary structure of particular domains of RNA molecules that interfere with RNA shearing and subsequent sequencing.
The background reads in RNA-Seq data sets have been reported to be low. For example in RNA-Seq data obtained from yeast, no reads matching a 3.5-kb deleted region were obtained, and very few reads matching to nontranscribed centromeres were identified
. A similar result was found in our RNA-Seq analysis; a comparison of the EBR1 expression level between PH-1 and ebr1 showed no transcription of the EBR1 gene in the ebr1 deletion mutant. In addition, we found that RNA-Seq data analysis for both PH-1 and ebr1 RNA-Seq is very reproducible.
Analysis of the read distribution suggests that 12.9% of reads matched to intergenic regions, which is relatively high in comparison to 3% and 5% found in H. sapiens and A. thaliana, respectively
[13, 31]. This high percentage may at least partly reflect the lower quality of gene model prediction in F. graminearum compared to H. sapiens and A. thaliana. In the latter two genomes, several rounds of gene annotation have been performed and more experimental evidence has been provided to support the gene models. In addition, 16% of the reads could not be matched to a single location in the genome, a finding that was also reported in other species
[13, 31]. For instance, in H. sapiens, 20% of the intergenic reads match to multiple locations in the genome, of which 6% match to 2–10 locations and 14% to more than 10 locations
. Furthermore, we identified that most of the reads mapping to multiple locations originate from intergenic regions and UTRs, whereas only very few reads matched to coding regions, which suggests that the reads matching to each transcript are very specific and the read coverage of each transcript is a reliable reflection of the gene expression level.
RNA-Seq has been widely used to identify incorrect gene models and alternative splicing in different organisms
[10, 13, 30, 33]. However, to distinguish incorrect gene models from alternative splicing is a challenging and laborious task. In this study, all selected genes were manually examined in the CLC software package to identify reads showing splice sites. RT-PCR analysis on the selected genes confirmed that identification of incorrectly annotated gene models and alternative splicing appears reliable. In total 655 genes were identified with incorrect gene models in the Broad F. graminearum database. Excluding genes with no detectable expression or with low read coverage (less than 50 reads), the fraction of incorrect gene models in the published annotation of the Broad F. graminearum database is 10.3%. Gene model predictions in the MIPS F. graminearum database were considered to be of higher quality than those in the Broad F. graminearum database
, which was confirmed by our RNA-Seq analysis. Nonetheless we could still improve many gene models predicted in the MIPS F. graminearum database. Even some of the manually revised gene models in the MIPS F. graminearum database appeared to be incorrect, indicating that gene annotations in the F. graminearum database still need to be improved and that RNA-Seq analysis can significantly improve the published gene models. In this study, RNA-Seq data were generated from mycelia growing in nutrient-rich medium. To investigate whether the incorrectly annotated genes are caused by alternative splicing, we also analyzed the available EST data generated from other conditions, such as carbon- and nitrogen- starved media and cultures of maturing perithecia
. These EST data support our discoveries that genes are incorrectly annotated, but six genes were identified that have two different transcripts, indicating that they might be alternatively spliced. Consequently, some of the genes classified in this study as incorrectly annotated genes might in fact be alternative spliced genes.
Alternative splicing has been investigated in many organisms including H. sapiens, Caenorhabditis elegans, A. thaliana and C. neoformans[13, 37, 44]. In H. sapiens, 95% of the genes undergo alternative splicing
[11, 44]; in A. thaliana, alternative splicing is estimated to be 42%
. In fungi much lower percentages of alternative splicing have been predicted, including 4.3% (277 genes) in C. neoformans, 1.3% (162 genes) in A. flavus and 1.4% (151 genes) in M. grisea[26, 29]. We found alternative splicing in 231 genes (1.7%) in F. graminearum, but it should be noted that we have only analyzed expression in one growth condition and as fungi can adapt to many different environmental conditions we expect that this percentage will increase when transcription profiles under more different growth conditions are analyzed. At least 4 different types of alternative splicing exist in F. graminearum, of which intron retention appeared most prevalent, which is also the case in A. thaliana[13, 45, 46], whereas in H. sapiens, exon skipping is most prevalent
In-frame analysis showed that the majority of the alternatively spliced transcripts identified in F. graminearum cause premature termination codons (PTCs), of which most are located in intronic regions. Also in H. sapiens and A. thaliana, a high percentage of alternatively spliced transcripts contain PTCs
[13, 47]. In A. thaliana, 77.9% of the alternatively spliced genes introduce PTCs and most of them are considered as potential targets of the nonsense mediated mRNA decay (NMD)
[13, 46]. NMD was initially identified in S. cerevisiae and later widely studied in higher eukaryotes
[48–50], but so far, only a few studies on NMD are reported in filamentous fungi
 and whether the PTCs identified in F. graminearum are also associated with NMD needs to be further investigated. Apart from PTC isoforms, some alternatively spliced transcripts encoding proteins with diverse length were identified. The effects of the diversity in length on the biological function of proteins are still unknown, but several functions including binding properties, intracellular localization, enzymatic activity or stability might be affected
Alternative splicing appears widespread in eukaryotes, but the biological function of alternative splicing is still poorly understood. Some studies have shown that alternative splicing events are developmentally regulated or associated with the response to different environmental conditions
[53, 54]. For instance, in A. thaliana, the CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) gene produces two different transcripts and their expression ratio is dependent of light and temperature
. Similarly, in A. thaliana splicing of serine/arginine-rich protein-encoding genes is altered in response to hormones or abiotic stresses
. In H. sapiens, a number of genes involved in apoptosis
 and differentiation of embryonic stem cells are regulated by alternative splicing
[19, 56]. In our study, we have also demonstrated that for some genes the alternative splicing events are regulated at different vegetative growth stages in F. graminearum; their biological implications are not yet understood, but they might be important in adaptation of F. graminearum to changing external environmental conditions that occur during different growth stages.
As reported previously in other species, in addition to the canonical GT donor and AG acceptor sites in introns there are several non-canonical donor and acceptor sites, of which GC occurs most frequently as an alternative donor site
[13, 57, 58]. The non-canonical splice sites in F. graminearum also showed that the GC donor, AG acceptor combination is prevalent, of which the proportion is consistent with what has been found in other organisms
[13, 58]. In addition, the nucleotide preferences flanking the GC donor splice site and AG acceptor splice site identified in F. graminearum are consistent with previous reports in other organisms