Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen

Background Powdery mildews are biotrophic pathogenic fungi infecting a number of economically important plants. The grass powdery mildew, Blumeria graminis, has become a model organism to study host specialization of obligate biotrophic fungal pathogens. We resolved the large-scale genomic architecture of B. graminis forma specialis hordei (Bgh) to explore the potential influence of its genome organization on the co-evolutionary process with its host plant, barley (Hordeum vulgare). Results The near-chromosome level assemblies of the Bgh reference isolate DH14 and one of the most diversified isolates, RACE1, enabled a comparative analysis of these haploid genomes, which are highly enriched with transposable elements (TEs). We found largely retained genome synteny and gene repertoires, yet detected copy number variation (CNV) of secretion signal peptide-containing protein-coding genes (SPs) and locally disrupted synteny blocks. Genes coding for sequence-related SPs are often locally clustered, but neither the SPs nor the TEs reside preferentially in genomic regions with unique features. Extended comparative analysis with different host-specific B. graminis formae speciales revealed the existence of a core suite of SPs, but also isolate-specific SP sets as well as congruence of SP CNV and phylogenetic relationship. We further detected evidence for a recent, lineage-specific expansion of TEs in the Bgh genome. Conclusions The characteristics of the Bgh genome (largely retained synteny, CNV of SP genes, recently proliferated TEs and a lack of significant compartmentalization) are consistent with a “one-speed” genome that differs in its architecture and (co-)evolutionary pattern from the “two-speed” genomes reported for several other filamentous phytopathogens. Electronic supplementary material The online version of this article (10.1186/s12864-018-4750-6) contains supplementary material, which is available to authorized users.

architecture of B. graminis forma specialis hordei (Bgh) to explore the potential influence of its genome 23 organization on the co-evolutionary process with its host plant, barley (Hordeum vulgare). The near-24 chromosome level assemblies of the Bgh reference isolate DH14 and one of the most diversified 25 isolates, RACE1, enabled a comparative analysis of these haploid genomes, which are highly enriched 26 with transposable elements (TEs). We found largely retained genome synteny and gene repertoires, 27 yet detected copy number variation (CNV) of secretion signal peptide-containing protein-coding genes 28 (SPs) and locally disrupted synteny blocks. Genes coding for sequence-related SPs are often locally 29 clustered, but neither the SP clusters nor TEs are enriched in specific genomic regions. Extended 30 comparative analysis with different host-specific B. graminis formae speciales revealed the existence 31 of a core suite of SPs, but also isolate-specific SP sets as well as congruence of SP CNV and phylogenetic 32 relationship. We further detected evidence for a recent, lineage-specific expansion of TEs in the Bgh   (Spanu et al. 2010) although the difference is comparatively small (Table S1). We did not find any 133 evidence for the presence of previously reported Bgh-specific plasmid-like linear extrachromosomal 134 DNA (Giese et al. 1990) in the two isolates.

135
To further assess assembly quality and to facilitate future genome-anchored genetic studies, we 136 compared the assemblies with a previously generated genetic map for Bgh (Pedersen et al. 2002). We 7 centromeres (Suppl. Figure 2, Table S2). The gene-scarce regions are in all cases associated with 148 specific long interspersed nuclear elements (LINEs) of the Tad1 family (Suppl. Figure 2).

149
The circular mitochondrial genomes of both isolates were closed, yielding total sizes of 104 kb (DH14) 150 and 139 kb (RACE1) (Suppl. Figure 3A), which is in agreement with older experimental estimates 151 (Borbye et al. 1992). Nucleotide sequence alignment indicated >96% identity of the mitochondrial DNA 152 (mtDNA) of the two genomes. It further revealed that the RACE1 mitochondrial genome contains a 153 ~32 kb duplication, while the DH14 mtDNA encompasses an ~1 kb isolate-specific sequence stretch 154 that includes one predicted open reading frame (Suppl. Figure 3B). The structural and nucleotide 155 differences might be linked to the isogamous and hermaphroditic manner of mitochondrial inheritance 156 in B. graminis (Robinson et al. 2002)

184
To compare the gene repertoires encoded by the DH14 and RACE1 isolates, we first used OrthoFinder 185 to infer orthologous gene groups (orthogroups). This analysis identified 6,039 single-copy groups 186 containing gene pairs with unambiguous one-to-one relationship between the isolates ( Figure 1A). By 187 manually incorporating additional position and synteny information from a whole-genome alignment 188 (see below) for the inference of orthologous gene pairs, we could further resolve some ambiguities 189 and identify additional relationships for unassigned genes with more dissimilar sequences, significantly 190 increasing the number of one-to-one gene pairs to 6,844 ( Figure 1B,  (Table S6). Of these genes, 12 were specifically expressed in RACE1 (of which seven encode   219 CSEPs), while three were expressed specifically in DH14 (Table S6) in the other isolate) or by sequence areas without a close match in the other genome ( Figure 2B).

233
These interspersed alignment gaps typically are rather small (<1 kb on average) and concern primarily 234 regions of repetitive sequence, while only rarely affecting protein-coding genes (only 1% of alignment 235 gaps affect genes). As both genomes are not resolved entirely to whole-chromosome level, we cannot 236 estimate the full extent of large-scale chromosomal reshuffling. Nevertheless, the occurrence of 237 within-contig alignment breaks provides evidence for at least two large-scale genomic rearrangements 238 that involve genome stretches larger than 1 Mb (  between the three isolates shows that in RACE1 SNP frequencies below one SNP per kb are seen only 257 rarely, whereas in A6 and K1 they are common (Suppl. Figure 7A) Figure 8A). Rather we found that the SP count follows the scaffold size (Suppl. Figure   279 8B), which is in line with the results of a  2 -test that did not detect a significant deviation between the 280 SP frequency per scaffold and the underlying genome fraction per scaffold (p=0.21   (Table S11). Yet, the formae speciales 340 avenae, lolii and poae still share smaller intersections with the Bgh secretome compared to the rest 341 (Suppl. Figure 10A).  (Table S12).

397
Genome assemblies of B. graminis isolates belonging to other formae speciales, which are exclusively 398 based on short reads, were found to underestimate both the magnitude of TE expansion and the 399 presumed divergence time. This is due to the fact that the majority of the highly similar repetitive 400 sequences collapse into few contigs, as revealed by the comparison of Bgh assemblies that are either 401 based on long (PacBio) or short reads (Illumina; Suppl. Figure 11A). Therefore, for the other formae 402 speciales of B. graminis it can only be assumed that they also experienced a recent TE expansion, while 403 it remains unclear whether this event is older or more recent than the one in Bgh.

404
Remarkably, when applying the same pipeline on the sequenced genomes of the dicot-infecting 405 powdery mildews Erysiphe necator, E. pisi and Golovinomyces orontii, the divergence from the 406 consensus of the respective TE sequences is much higher (25%-35% compared to <10% in Bgh),

407
suggesting that the expansion of the repetitive elements in these species is more ancient than in Bgh 408 (Suppl. Figure 11B). This calculation is unlikely to be an underestimation due to the short read-based

433
The high assembly quality is also supported by both an improved gene space coverage (now >98% 434 BUSCO coverage for the newly annotated DH14 reference genome;

451
In addition to a drastically improved assembly of repetitive sequences, we also noticed the existence 452 of loci with similar or identical copies of a number of genes, which had previously been collapsed into 453 single gene models (Table S3) In Bgh, however, the situation is clearly different, as the numerous TEs are not restricted to specific 485 areas, but rather evenly dispersed throughout the genome (Figure 2A, Suppl. Figure 8A). In addition, 486 neither are the flanking regions of SPs particularly enriched in TEs, nor are they markedly larger 487 compared to non-SPs ( Figure 4A, B). Moreover, SPs, whether they are categorized as putative effectors 488 (CSEPs) or not, are not associated with gene-sparse (Suppl. Figure 2B) or peculiar genomic regions 489 (Suppl. Figure 8A), but their number is positively correlated with scaffold size (Suppl. Figure 8B).

490
Additionally, the dN/dS ratio of the CSEPs is not associated to local gene density (Suppl. Figure 2C).

491
We also did not detect any large lineage-specific regions as reported for  Figure 7). For the highly divergent Japanese isolate RACE1, on the other 507 hand, no monomorphic regions were detectable relative to DH14 (Suppl. Figure 7), which is most likely 508 due to the prolonged geographic separation of the two isolates during which sequence variation could 509 accumulate at a whole-genome scale.

722
To define the core Blumeria-specific effectorome, we assembled the genomes of the formae speciales 723 avenae, dicocci, dactylis, lolii, poae, secalis and triticale using the publicly available raw Illumina reads 724 for the isolates AVE, LIB1609, DAC, LOL, POAE, S1459 and T1-20 (Table S13). The assemblies were 725 carried out using ABySS 2.0.2 (Jackman et al. 2017), and the gene space coverage with BUSCO (Table   726   S14). For the forma specialis tritici the reference assembly of the isolate 96224 was used. The resulting 727 contig sequences were de novo-annotated with one round of MAKER using the same settings as for 728 the DH14 annotation (described previously, also https://github.com/lambros-f/blumeria_2017).
an ortholog search was performed using OrthoFinder and the predicted proteomes of the formae sequences, the predicted Bgh SPs were inserted in the analysis as mature peptides.

735
To generate a maximum likelihood-based phylogenetic tree for the SPs, all the Bgh DH14 mature