Here we report the identification of 491 CSEPs in the B. graminis genome, nearly doubling the number previously described
. We explored systematically the evidence that the proteins encoded by these genes have effector-like properties using bioinformatics tools and expression studies. This work will facilitate the future investigation of their functional relevance in the interaction between the fungus and its host barley
. The typically short ORFs, the unrelatedness of their gene products to known proteins and the tight association of CSEPs with retro-transposons (see below) render the identification of CSEPs in the large and highly repetitive B. graminis f.sp. hordei genome a challenging task. Although our bioinformatics pipeline converged after three rounds of iteration on a set of 491 CSEPs, suggesting saturation, we cannot exclude the possibility that some CSEPs escaped our attention and are missing in the present analysis. Since we deliberately focussed our CSEP prediction on genes coding for proteins with no recognizable counterparts outside the powdery mildews, we also cannot exclude that some proteins with N-terminal secretion signal and identifiable sequence similarity to polypeptides in other species (e.g. secreted proteases) exert an effector function during B. graminis pathogenesis. The clustering of CSEPs into families of paralogs (Figure
2, Additional file
4) suggests that CSEPs have gone through iterated rounds of gene duplications during evolution, and some of them are now amongst the largest gene families in B. graminis; in fact, overall CSEPs represent >7% of the protein coding genes in the B. graminis genome. This is a remarkable testimony to the importance of CSEPs in the powdery mildew fungi, particularly when taken together with the loss of a large number of conventional ascomycete genes and reduction in gene family size observed in this fungus
. Since very few CSEPs have recognizable orthologs in the genomes of the two powdery mildew fungi, Golovinomyces orontii and Erysiphe pisi, representing other genera, we conclude that proliferation of CSEPs occurred after the separation of B. graminis from the dicotyledonous plant-infecting mildew lineage some 75 million years ago
. CSEPs have since undergone rapid evolution and diversification. It will be interesting to correlate the CSEPs to orthologs in other B. graminis “formae speciales”, for example the wheat mildew fungus B. graminis f.sp. tritici, to relate CSEP differentiation with the evolution of host specificity and analyse the variation between CSEPs of extant isolates
Expression of effectors in the interaction between the obligate biotroph B. graminis and its host is expected to occur either in the appressorium and penetration peg or in the haustorium to be able to manipulate the plant. Indeed, we found that a large fraction of the CSEPs showed a significantly higher expression in haustoria compared to the epiphytic tissue isolated at five days after inoculation (Figure
2). This was further corroborated by the observation that expression of more than two thirds of the 97 CSEPs identified as proteins by mass spectrometry was only detected in haustorium-containing epidermis (Figure
2). In general, there was a clear congruence between the EST, the RNAseq expression and proteome data. Haustorium-specific effector expression is expected in order to suppress defence throughout the fungal life cycle
. On the other hand, we also predict that some CSEPs may be expressed very early during penetration and these might already be present in the germinating conidia and exert their function before haustoria are formed. Unfortunately, there is only limited transcript data available from conidia at the stage of penetration, but maybe the larger CSEPs (300–400 amino acids), belonging to the three largest families and in general showing low transcript levels (Figure
4) have functions at the earlier stages during infection. Here a low relative abundance may suffice due to the narrowly focussed area at the tip of the penetration hypha where the protein needs to act.
Diversification through positive selection for amino acid changes has occurred in many of the families of paralogs and points to adaptive modifications (Figure
3 and Figure
4, Additional files
13). These may have resulted in increased virulence and/or avoidance of R-protein mediated recognition. Measuring positive diversifying selection is only possible in families where the coding sequences can be aligned with sufficient confidence and many of the CSEPs have diverged so much that they are too different for a reliable analysis. We may therefore have underestimated the degree of positive selection across the CSEP superfamily. In any case, the incidence of diversifying selection found here shows that it is a widespread and fundamental process in the evolution of B. graminis CSEPs as also demonstrated in other plant pathogenic plant pathogens
[20, 21]. However, we also find families (e.g. families 7, 9, 14, 17 and 26) showing only purifying selection. In these cases, CSEP evolution evidently favoured conserved structures to exert their functions. The CSEP families can be grouped in two main categories based on a number of characteristics (Figure
4). One group includes approximately 180 short CSEPs. These: 1) are typically 100 to 150 amino acids long; 2) have the strongest preferential transcript and protein accumulation in haustoria; 3) often have codons that show evidence of diversifying-selection, perhaps because they may be functioning in close interaction with plant targets and the R-protein surveillance system; 4) usually have only two or three cysteines and are thus not cysteine-rich as many apoplastic effectors suggesting that they may act inside the host cells. The other group of families comprises about 140 CSEPs, including those in the three largest families (1, 2 and 3) and several smaller families, and shares the following features: 1) they are relatively long CSEPs (300–400 amino acids); 2) they do not show high preferential expression in haustoria; 3) they have few codons subjected to positive diversifying selection, but in contrast they have many codons that have experienced pronounced purifying selection; 4) they often have several conserved cysteines predicted to form disulphide bonds in an oxidizing environment. The latter characteristic might indicate that their host targets are localized in the apoplast or extra-haustorial matrix
. The structure prediction studies and 3D modelling showed that CSEPs generally have similar characteristics as known effectors described in other fungi
 and the proteins are detected preferentially in B. graminis haustoria in infected epidermis
. On average, the global model quality scores for the generated 3D models were in most cases poor or low (p>0.05), which is expected since most CSEPs had distant or no detectable homology to known template structures. However, a number of the structural models were of medium (p<0.05) and high (p<0.01) confidence, and when the amino acid residues under positive selection were mapped on these models, they are often in the loop regions predicted to be exposed and thus possibly available for interactions with other proteins as part of their effector functions (Figure
3D and Additional file
18). It will be interesting to discover whether experimental determination of the 3D structure of these proteins confirms these predictions, particularly regarding the relative position of the residues under diversifying-selection.
The cysteine frequency is higher in CSEPs than in average B. graminis proteins. However, it is much lower than that found among the cysteine-rich secreted proteins from Melampsora larici-populina and apoplastic cysteine-rich effectors from Cladosporium fulvum. The highest conserved cysteine frequency in B. graminis CSEPs, found in family 3, is only 2.9% and the presence of the cysteine in the YxC motif contributes significantly to the higher frequency of this amino acid in the CSEPs compared to other B. graminis proteins. It has been suggested that the many cysteines of the M. larici-populina effectors are important for the overall fold topology rather than for resistance towards degradation in the apoplast
. However, given the conservation of the disulphide bond in the ribonuclease fold, it is most likely that this motif is important for protein stability
. Our analysis shows a similar picture although in the CSEPs it is not only cysteines, but also other commonly conserved amino acids (e.g. glycine and proline) that are predicted to contribute to structural properties.
The YxC-motif, first found in B. graminis CSEPs
, is a common motif among effector candidates also in rust fungi, such as Puccinia graminis f.sp. tritici[6, 24], P. striiformis f.sp. tritici and Melampsora larici-populina[13, 22]. Here we showed that the B. graminis CSEPs have mostly an N-terminal YxC-motif, but in the longer CSEPs it can occur over the whole length of the protein. The functional significance of this motif remains elusive. However, a conserved host cell targeting sequence (RXLR-DEER) followed by C-terminal functional regions has been found in other plant pathogens, for example in the oomycete Phytophthora infestans.
Using two different approaches (InterProScan and IntFOLD) we found that 72 of the 491 CSEPs have recognizable relationships to ribonucleases (Additional file
7). This is possibly an underestimation as sequence variation in residues that are critical for the assignment to this polypeptide category will probably lead to false-negative predictions. It is interesting to note that the vast majority (54 of 57) of proteins in the B. graminis genome with domain IPR016191 (ribonuclease/ribotoxin) are CSEPs. There are numerous additional proteins encoded by the B. graminis genome that have a relationship to RNA metabolism/turnover. These include for example proteins with InterProScan domains IPR000504 (RNA recognition motif domain, 60-times present), IPR012337 (ribonuclease H-like domain, 35-times present) and IPR001247 (exoribonuclease phosphorolytic domain 1, 6-times present), but none of them is found in the CSEP set. The IPR016191 (ribonuclease/ribotoxin) domain thus seems to be a hallmark of the CSEP family, suggesting that it might be important for effector structure or activity, while other RNA binding or modifying proteins encoded by the B. graminis genome might have housekeeping functions.
A secreted fungal ribonuclease appears to be the common origin of many CSEPs in different families, as an alignment suggests that 10–20 spaced and moderately preserved amino acids are conserved between ribonucleases and these CSEPs (Figure
5). These amino acids are likely to play important structural roles in scaffolding the CSEPs, being located typically in the β-sheets or at the border between a β-sheet and a loop region. Meanwhile, we found that the regions with amino acids under diversifying selection are located in the loops and predicted to be exposed on the surface of the proteins. Although the ribonuclease-like proteins are unlikely to be functional as RNA-degrading enzymes since they lack critical active site residues (Figure
5A), we speculate that some of these effectors could still be involved in interactions with host RNAs and modulate host immunity via this route. Alternatively, as extracellular ribonucleases are very stable molecules, highly resistant to proteolytic degradation, they may have had a rigid structure that could have been an ideal starting scaffold for evolving an effector arsenal, in which the loop regions were subjected to positive selection allowing the CSEPs to diversify and avoid recognition by host surveillance factors (R-proteins). A similar example of structural conservation among effector candidates has recently been found by Win et al.
, who showed that RXLR effectors of the Peronosporales (oomycetes) often share a WY-domain that is structurally conserved despite high sequence divergence between different plant pathogenic species. The genes encoding the CSEPs shown in Figure
5A have a common relative intron location, further corroborating a common ancestor. Moreover, since this intron location is also shared in many other CSEP genes, it may be that a large proportion of the CSEPs have evolved from an ancestral microbial ribonuclease similar to ribonuclease T1. A model for CSEP gene amplification was suggested based on the observation that CSEPs belonging to the same family are very often clustered in the genome and in several cases separated by less than 10 kb (Figure
6). This hinted that they evolved by gene duplication events due to unequal crossovers
[27, 28]. Also in another biotrophic phytopathogenic fungus, Ustilago maydis, genes encoding small secreted proteins with unknown function were found in clusters
, even though the spacing between those effectors was much shorter, possibly reflecting the general compactness of that genome compared to that found in B. graminis. Illegitimate recombination was found to be the major driving force in gene duplications in plants, for instance in the evolution of multi-locus resistance genes
[30, 31]. This clustering is in contrast to the situation for the family of EKA genes, encoding another type of putative B. graminis effectors that have spread across the entire genome by means of a transposable element
. We found that CSEP genes often are closely associated with two well-described SINE-type retro-transposons, Egh24, Eg-R1 and another high copy repeat (AJ002007.1) originally found in the wheat powdery mildew B. graminis f.sp. tritici, which are all very abundant in the genome. The genomic regions adjacent to CSEPs in most cases are flanked by these repetitive DNA elements and the pattern is conserved between the closest paralogs. Unequal crossover is mediated by highly similar sequences and therefore the retro-transposable elements are very good candidates for facilitating such events. In M. larici-populina tandem repeats of AvrM-paralogs are also flanked by transposable elements
. Powdery mildews including B. graminis have lost the repeat-induced point mutation (RIP) pathway
 and this may have allowed extensive amplification of transposable elements in the genome. Our findings here suggest transposable elements have helped CSEPs to multiply and proliferate as described for the EKA effector gene candidates
. If this is true, then the loss of the RIP pathway and resulting retro-transposon driven genome expansion could have conferred a selective advantage and facilitated evolution of powdery mildew fungi by potentiating proliferation of effector proteins.
Here we have shown that many CSEPs are likely to have evolved from (an) ancestral extracellular ribonuclease(s) through a series of gene duplications followed by diversifying positive selection. A number of different models for the evolution of gene duplications have been proposed and they can be classified depending on how gene duplications affect fitness, whether there is positive diversifying or purifying selection and whether there is pre-existing allelic variation
. Our observation that the transcript level of many CSEPs was high in planta is consistent with the view that the expression level is important for their function. For example, many effectors work by interacting with proteins where it matters to be present in abundance to inactivate their targets. Many effectors are also exposed to proteases leading to a fast degradation, so a high transcript level will be an advantage. A gene duplication resulting in two copies will often lead to a further increase in expression through a gene dosage effect and thereby increased fitness. Once duplicated, the genes can be subject to diversifying selection: indeed we have detected pronounced diversifying selection in some families. Overall this fits with the “diversifying selection model”, described by Innan and Kondrashov
, explaining how gene families can evolve and result in new functions for the individual members. Our work is a further illustration of how a stable structural fold may act as a template for diversification