Comparative analysis of secreted protein evolution using expressed sequence tags from four poplar leaf rusts (Melampsora spp.)
© Joly et al. 2010
Received: 15 January 2010
Accepted: 8 July 2010
Published: 8 July 2010
Skip to main content
© Joly et al. 2010
Received: 15 January 2010
Accepted: 8 July 2010
Published: 8 July 2010
Obligate biotrophs such as rust fungi are believed to establish long-term relationships by modulating plant defenses through a plethora of effector proteins, whose most recognizable feature is the presence of a signal peptide for secretion. Since the phenotypes of these effectors extend to host cells, their genes are expected to be under accelerated evolution stimulated by host-pathogen coevolutionary arms races. Recently, whole genome sequence data has allowed the prediction of secretomes, facilitating the identification of putative effectors.
We generated cDNA libraries from four poplar leaf rust pathogens (Melampsora spp.) and used computational approaches to identify and annotate putative secreted proteins with the aim of uncovering new knowledge about the nature and evolution of the rust secretome. While more than half of the predicted secretome members encoded lineage-specific proteins, similarities with experimentally characterized fungal effectors were also identified. A SAGE analysis indicated a strong stage-specific regulation of transcripts encoding secreted proteins. The average sequence identity of putative secreted proteins to their closest orthologs in the wheat stem rust Puccinia graminis f. sp. tritici was dramatically reduced compared with non-secreted ones. A comparative genomics approach based on homologous gene groups unravelled positive selection in putative members of the secretome.
We uncovered robust evidence that different evolutionary constraints are acting on the rust secretome when compared to the rest of the genome. These results are consistent with the view that these genes are more likely to exhibit an effector activity and be involved in coevolutionary arms races with host factors.
Rust fungi or Pucciniales (= Uredinales) represent the largest group of fungal plant pathogens, including more than 7000 species that possess the most complex life cycles in the Kingdom Fungi . Some of these obligate biotrophs have been of long standing concern for agriculture and forestry while others have emerged in recent epidemics. For instance, poplar leaf rusts belonging to the genus Melampsora are considered as the world's most important disease of poplars . Selection for durable resistance to these pathogens is thus an important challenge for poplar breeders . Although poplar breeding programs have been in place for decades in Europe, clones selected for complete resistance against rust have increasingly succumbed in time to new races of Melampsora larici-populina . Sustainability of newly selected resistance clearly requires a better understanding of the molecular mechanisms involved in Populus - Melampsora interactions.
Prokaryotic and eukaryotic plant pathogens have evolved highly advanced strategies to engage their hosts in intimate contacts and deliver suites of effector proteins to modulate plants' innate immunity and enable parasitic colonization [5–12]. Understanding the translocation mechanisms of bacterial pathogen effectors inside host cells has been an outstanding breakthrough with the characterization of the type III secretion system. This export apparatus enables a bacterium to manipulate host cellular processes by injecting effector proteins into the host cytoplasm . Similarly, plant-parasitic nematodes have developed diverse relationships to obtain nutrients from their host plants. Perhaps their most evolutionary sophisticated adaptations are effector proteins encoded by parasitism genes expressed in oesophageal gland cells and secreted through a protrusible feeding spear, called a stylet [5, 6]. However, still little is known about such translocation machineries in filamentous pathogens (mainly fungi and oomycetes), although another specialized biotrophic infection cell called the haustorium is thought to be involved [13–15]. The haustorium invaginates host cells and makes near-direct contact with the host plasma membrane, where it plays a crucial role in nutrient acquisition. This structure is a regulatory hub involved in the manipulation of host metabolism and the suppression of host defenses, which allows the establishment of a successful biotrophic relationship [16–19]. The concomitance of haustoria formation with the induction of a programmed cell death response termed the hypersensitive response (HR) suggests a significant role for this structure in delivering effector proteins into the infected host cell [14, 20].
Key insights have emerged from the recent identification of filamentous pathogen effectors with avirulence activity inducing plant defense responses and HR [21–26]. Most of the avirulence genes identified encode small proteins with N-terminal signal peptides that direct them through the endoplasmic reticulum secretory pathway [14, 27]. While effector genes reside in pathogen genomes, their products essentially generate phenotypes that extend to host cells and tissues, and are hence likely to be the direct target of the never-ending coevolutionary conflict between host and pathogen [28, 29, 36]. In fact, avirulence proteins recognition by plant resistance proteins imposes selection against effector function, and pathogen effector proteins probably overcome resistance through diversification of the genes encoding them . For instance, several avirulence genes or their plant counterparts display molecular hallmarks of positive selection [21, 22, 25, 30–38]. Recently, the availability of filamentous plant pathogen genome sequences facilitated the cataloguing of whole secretomes using computational analyses, thus allowing the identification of putative effectors [39–43]. Indeed, Tyler et al.  provided evidence that secreted proteins have been subject to accelerated evolution by contrasting the genome sequences of Phytophthora ramorum and Phytophthora sojae.
Here we provide an overview of the expressed secretome of poplar leaf rusts belonging to the genus Melampsora. We constructed cDNA libraries for four related poplar leaf rust pathogens with different host specificities to test for the signature of selection in these rust secretomes and used computational tools to annotate putative secreted proteins. A comparative genomics approach based on homologous gene groups (HGGs) was undertaken to evaluate the extent to which secretome members are under different evolutionary constraints. We describe adaptive evolution (positive selection) in genes encoding secreted proteins of poplar leaf rusts, in agreement with the idea that these genes are expected to display an effector activity and be involved in the escalating and reciprocal coevolutionary arms race with host resistance factors.
Summary of Melampsora cDNA libraries characteristics
Putative Sa (%)
Putative S+b (%)
M. larici-populina/Populus nigra
M. larici-populina/P. nigra
M. medusae f. sp. deltoidae/P. deltoides
M. medusae f. sp. tremuloidae/P. tremuloides
M. occidentalis/P. trichocarpa
Similarity of Melampsora unisequences to sequences from UniProtKB, BasidiomycotaDB and PuccinialesDB
% with homologues in
% with homologues in BasidiomycotaDB
% with homologues in PuccinialesDB
M. larici-populina haustoria
M. larici-populina ex planta material
M. medusae f. sp. deltoidae ex planta material
M. medusae f. sp. tremuloidae ex planta material
M. occidentalis ex planta material
Host infection by biotrophic fungi is believed to involve the secretion of effectors that suppress plant defenses and alter cellular metabolism to fulfill the requirements of the invading pathogen . Among the candidates that had significant similarity to known sequences, homologies with potential effectors previously identified in the haustoria of other rust species were observed. Interestingly, 22 candidates had similarities with eight "haustorially expressed secreted proteins" (HESPs) from Melampsora lini identified using a similar bioinformatics approach. Some of these HESPs were shown to co-segregate with known avirulence genes and are proven HR elicitors in flax, such as the avirulence protein AvrM . The percentage of identity between flax rust and poplar leaf rust HESPs differed greatly. For instance, HESP-379 had close homologues in Melampsora spp. (identity: 93%), while AvrM appeared less conserved (identity: 26%). Other candidates had similarities with Rust Transferred Protein (RTP1) from Uromyces fabae, a small secreted protein that is specifically expressed in broad bean rust haustoria and translocated into host cells. This protein accumulates within the cytoplasm of the infected host cell and in the host nucleus, suggesting a role in influencing host gene expression . Homologies with other proteins thought to contribute to pathogenesis were unravelled, with 19 unisequences having similarity to CFEM domain proteins (CFEM = Common in Fungal Extracellular and Membrane). This particular domain is an eight cysteine-containing domain for which members are proposed to have important roles in fungal pathogenesis , and it was by far the most highly represented PFAM domain in the S+ dataset. CFEM-containing proteins could function as cell-surface receptors, signal transducers, or adhesion molecules in host-pathogen interactions . Moreover, five unisequences had significant similarity to gEgh16/gEgh16 H proteins from Blumeria graminis, a large family potentially involved in host-pathogen interactions . An appressorium-specific expression pattern was described for numerous gEgh16/gEgh16 H homologues, including virulence genes GAS1 (MAS3) and GAS2 (MAS1) from Magnaporthe grisea. GAS1 or GAS2 deletion mutants had no defect in vegetative growth, conidiation or appressoria formation, but were reduced in appressorial penetration and lesion development .
The diverse ecological niches of fungal species are mirrored in their secretome, which includes gene families encoding various proteolytic and carbohydrate-degrading enzymes known to act on the linkages found in plant cell walls and compatible with the array of nutritional sources they can exploit . Consistent with the molecular function GO analysis (Figure 1), 36.56% and 47.88% of Melampsora secreted proteins with a GO assignment were involved in binding (GO:0005488) and catalytic activity (GO:0003824), respectively. Note that there was a trend towards concentration of a distinct set of processes and functions in the group of proteins making up the Melampsora secretome. Significantly higher proportions of secreted proteins, relative to the entire dataset, were assigned to the following functions: carbohydrate and ion binding, and hydrolase and oxidoreductase activity (GO:0030246 and GO:0043167, and GO:0016787 and GO: 0016491). There also appeared to be enrichment of proteins involved in processes related to cell adhesion (GO:0007155), response to stimulus (GO:0050896), regulation of molecular function (GO:0065009), carbohydrate and lipid transport (GO:0008643 and GO:0006869), and primary metabolic process (GO:0044238), including oxygen and reactive oxygen species metabolic process (GO:0006800) and carbohydrate metabolic process (GO:0005975). Apart from putative effectors, members of the secretome had homology to a battery of glycoside hydrolases and subtilisin-like serine proteases that likely contribute to the penetration of the plant cuticle and cell wall . In fact, glycosyl hydrolase 16 (PF00722) was the second most represented PFAM domain in the secretome, followed by domains typically found in proteolytic enzymes (Peptidase_S8 [PF00082], Subtilisin_N [PF05922] and Asp [PF00026]) (Additional File 1). A class of secreted proteins exhibiting the ability to neutralize reactive oxygen species (ROS), and including Mn and Cu/Zn superoxide dismutases, was also uncovered by this survey. This finding was not surprising as it is known that rust fungi prevent a variety of non-specific defense responses in invaded cells, thus allowing the establishment of the long-term biotrophic relationship between rust fungi and living host cells . Such host responses frequently involve the production of ROS, whose detoxification is essential for the establishment of the pathogen. A Mn superoxide dismutase homologue had previously been reported in the haustorial stage of Puccinia triticina  and was differentially-expressed in Uromyces appendiculatus germlings during early appressorium development . Concordant with these observations, previous studies have demonstrated the upregulation of several host genes encoding enzymes of the redox regulation pathways during Populus - Melampsora interactions [54, 55]. Another group of interesting secreted proteins identified here that may be critical for evading host recognition or protecting fungal cell wall from hydrolysis by host enzymes were chitin deacetylases, which have already been described in libraries from Phakopsora pachyrhizi  and P. triticina . The conversion of chitin into chitosan by de- N -acetylation not only protects fungal infection structures from hydrolytic attack by chitinases present in the host tissue, it also prevents the release of chitin oligomers responsible for the triggering of resistance reactions .
This functional annotation established the secretome's ability to perform diverse roles in pathogenicity and interactions with host cells. However, for the major part of the secretome, no GO or PFAM domains could be assigned. For 441 candidates, i.e. 64% of the S+ dataset, no significant similarity was found to known proteins in the UniProtKB database (E-value > 1e-4) (Table 2). To ascertain that the large number of unmatched proteins identified in this study was not due to the paucity of Pucciniales (or even Basidiomycota) sequences in international databases , we constructed two specific databases (BasidiomycotaDB and PuccinialesDB). BLASTX searches of the complete dataset of Melampsora unisequences (S+ and NS) were performed against these databases. The percentage of unisequences with homologues was similar between BasidiomycotaDB and UniProtKB, indicating that the latter is not deficient in Basidiomycota sequences (Table 2). As previously observed with BLASTX searches, the percentage of S+ unisequences with homologues in both BasidiomycotaDB and PuccinialesDB was generally lower compared to NS. These results are consistent with the view that secreted effector proteins that subvert host-cell structure and functions often show very limited phylogenetic distribution and no obvious conserved motifs, being less evolutionarily conserved. Furthermore, the percentage of homologues for the haustorium-enriched library was surprisingly low considering that the whole gene set of another rust, Puccinia graminis f. sp. tritici, was included in the database. However, these results were comparable to observations made on the haustorial secretome of other rust fungi [26, 58]. Most of the proteins secreted from the haustorium could be under rapid evolution and strong diversification due to host selective pressures and, therefore, be species-specific. Taken together, these results strongly indicate that a large portion of the secretome (and especially the haustorial secretome) of Melampsora may have virulence functions or be involved in host-pathogen interactions.
We used reciprocal TBLASTX searches (E-value ≤ 1e-30) among our taxonomic cDNA libraries to identify orthologues and/or paralogues in different Melampsora species libraries and to classify them into homologous gene groups (HGGs). In order to increase the number of HGGs, we included the gene models from P. graminis f. sp. tritici. We found 369 HGGs consisting of at least three different sequences with a minimum of two Melampsora unisequences retrieved. These HGGs included 1159 Melampsora unisequences and 437 P. graminis f. sp. tritici gene models. HGGs were divided to 283 non-secreted HGGs (76.7%) and 86 secreted HGGs (23.3%) according to their predicted localization. Consistent with the above similarity searches, approximately 45% of secreted HGGs had no P. graminis f. sp. tritici homologues (at E-value ≤ 1e-30), compared with 12% for non-secreted HGGs. Moreover, the proportion of secreted HGGs (23.3%) was twice the proportion of S+ unisequences (11.4%), and the mean number of Melampsora unisequences per HGG was slightly higher in secreted HGGs (4.3 unisequences/HGG) compared with non-secreted HGGs (2.9 unisequences/HGG). These results could suggest that a larger proportion of allelic forms and/or paralogue families exist among the rust secretome, which is consistent with an extensive sequence diversification motivated by the coevolutionary arms race .
Characteristics of the Melampsora homologous gene groups (HGGs) predicted to be under positive selection (at least one pairwise ω = d N/d S > 1.2)
XP_001211022 conserved hypothetical protein
AAS45284 proline-rich antigen
EAT81533 hypothetical protein
XP_757360 hypothetical protein
XP_758577 hypothetical protein
To identify additional HGGs under positive selection and detect the amino acid residues that are under positive selection, we contrasted the M2A/M1A, M8/M7 and M8/M8A models with likelihood ratio tests (Additional Files 5 and 6; see Methods) [64, 66]. Significant evidence of positive selection was found in 4 (including 2 secreted HGGs) of the 369 HGGs. Selective pressures had previously been identified for one of these secretome members using FEL statistics and a population genetics approach [68, 69]. This particular secreted HGG has homology with HESP-417 from M. lini, a gene known to be expressed in haustoria and encoding a secreted protein with an even number of Cys residues .
We assessed the position of positively selected HGGs on above SimiTri plots (indicated by arrows, Figure 4): only one non-secreted HGG (1278) had homologues in the three databases, two secreted HGGs (747 and 6067) were absent from Sporobolomyces, one secreted HGG (92) was absent from Ustilago, one secreted HGG (729) was present only in Puccinia, and the remaining HGGs (6 secreted and 2 non-secreted HGGs) had no hit. Two other groups of sequences identified using FEL statistics  had corresponding HGGs plotted on the line between Puccinia and Ustilago (absent from Sporobolomyces) (data not shown).
Database searches with sequences of small secreted proteins from fungi commonly do not yield homologues or known protein domains, the only recognizable features being the presence of a signal peptide for secretion and, in many cases, an even number of cysteine residues. Despite these commonalities, effectors appear to be evolutionarily diverse and highly variable in their distribution, showing very limited phylogenetic distributions possibly due to accelerated evolution stimulated by plant-pathogen arms races . In a straightforward in silico approach, we generated a first overview of the secretome from poplar leaf rusts belonging to the genus Melampsora, unravelling an unknown and diversifying set of genes. The identification of positive selection in putative secreted proteins reported here suggests that these genes are likely to encode candidate effectors implicated in host-pathogen interactions. Such information should be used to augment other selection criteria (such as gene expression data) for prioritizing candidate effector genes for functional studies. Two intriguing properties of rust fungi are their host specificity and their need for host alternation. Even though host specificity is probably controlled at several levels, examples from the flax rust fungus suggest that the secretion of effectors plays a prominent role . Their intimate interactions with host factors expose them to very strong selective pressures resulting in their rapid evolutionary turnover. However, poplar leaf rust effectors not only cope with the poplar defense machinery, they also face another phylogenetically diverse host plant, which differs between species, from other dicots to monocots and even gymnosperms. Is this particularity responsible for a greater sequence diversification? Or is it responsible for a broader arsenal of effectors in poplar leaf rusts? The recent completion of the M. larici-populina genome http://genome.jgi-psf.org/Mellp1/Mellp1.home.html will reveal further information on the whole repertoire of secreted proteins in this pathogen. Comparative genomics studies with other biotrophs should elucidate molecular mechanisms underlying common strategies to infect plants. The identification of host targets will provide further insight into the evolutionary forces that shaped the rust secretome, a key step facilitated by the availability of the poplar genome sequence  and transcript profiling of poplar-rust interactions [55, 71, 72]. This pathosystem clearly represents an unprecedented opportunity to understand the particularities of host-pathogen interactions.
cDNA libraries were constructed from ex planta material (resting and germinating urediniospores, germ tubes, etc.) of four different Melampsora taxa. Fungal materials from isolates of the North American subspecies M. medusae f. sp. deltoidae and the Eurasian species M. larici-populina were harvested from naturally infected leaves of eastern cottonwood (Populus deltoides) and hybrid poplar (P. balsamifera × P. maximowiczii) clones, respectively. Melampsora medusae f. sp. tremuloidae and M. occidentalis mono-uredinial cultures were used to inoculate fresh leaves of P. tremuloides and P. trichocarpa, respectively. Inoculated leaves were maintained for 13 days in a growth chamber at 60% humidity, 19°C and 16 h photoperiod. In addition, we generated one additional haustorium-enriched library (biotrophic stage) for the M. larici-populina species. Haustoria were isolated by affinity chromatography as described by Hahn and Mendgen . A mixture of plant leaf and fungal tissues was collected 6 days after inoculation of a rust-susceptible P. × jackii clone 1014 with the rust strain Mlp Berth. 3729. A 100-μm pore size nylon mesh was used to remove the bulk of the plant cell material from the crude preparation, which was then passed through an 11-μm pore size nylon mesh to remove intact plant cells. An affinity column was prepared by covalently attaching concanavalin A (Pharmacia Biotech) to cyanogen bromide-activated Sepharose 6 MB (Pharmacia Biotech) as described in the manufacturer's protocol. Samples of purified haustoria colored using Calcofluor white (50 μM final concentration) were then examined using a fluorescence microscope under UV filters.
Haustoria samples were pelleted by centrifugation at 20,800 g, washed twice with sterile distilled water, and maintained at -80°C until total RNA extraction. Fungal material was directly frozen in liquid nitrogen and ground using the Mixer Mill MM 300 with 2 mm tungsten carbide beads. All the following manipulations were performed according to the manufacturer's instructions. Total RNA was extracted from ex planta material and purified haustoria using the RNeasy mini kit (Qiagen, Valencia, CA, USA) and the Absolutely RNA® miniprep kit (Stratagene, La Jolla, CA, USA), respectively. PolyA RNA was purified using biotylinated oligo-dT/streptavidin-coated magnetic beads (Dynabeads® Oligo (dT)25; Dynal Biotech, Oslo, Norway). Haustorium-enriched M. larici-populina as well as M. medusae f. sp. tremuloidae and M. occidentalis ex planta cDNA libraries were generated using the SMART™ cDNA library construction kit (Clontech, Mountain View, CA, USA). Melampsora larici-populina and M. medusae f. sp. deltoidae ex planta cDNA libraries were constructed using the pBlueScript II XR cDNA library construction kit (Stratagene) according to the manufacturer's instructions. Following the column sepharose chromatography step included in the protocol, only size fractions above 500 bp were retained for ligation in the Sfi I-digested, dephosphorylated pDNR-LIB vector. Plasmid ligations were transformed by electroporation into E. coli ElectroMAXTM DH10BTM Cells (Invitrogen, Carlsbad, CA, USA). After library amplification and tittering, individual colonies were transferred onto 384-well microtiter plates containing LB medium with 30 μg/ml chloramphenicol for PCR amplification and sequencing. cDNA inserts were amplified according to the manufacturer's PCR protocol and then sequenced with the M13 forward primer (5'-GTAAAACGACGGCCAGT-3') using the Big Dye Terminator Cycle Sequencing Kit v1.1 on an ABI 3730xl sequencer (Applied Biosystems) at the CHUQ Research Centre (CRCHUQ) Sequencing and Genotyping Platform, Quebec City, QC, Canada. Sequences were deposited at NCBI under accession numbers GW672673 to GW687576.
Raw sequences were trimmed and cleaned using the Phred software , which resulted in the identification and removal of poor quality regions (quality cut-off of 20). Cross-match was then used to mask vector sequence in each read (minimum match of 10, minimum score of 20). The extent of redundancy for each library was ascertained using the Phrap software (Phil Green, http://www.phrap.org), which was also used to compile the unisequence set (minimum match of 50, minimum score of 100). In order to identify and remove plant sequences in the ESTs, unisequences were used in BLAST comparisons against the Populus trichocarpa genome and predicted gene models. Initial ORF prediction for Melampsora spp. was generated with the bestORF algorithm (Softberry) using Ustilago parameters.
Similarity searches for full length sequences and conserved domains were performed using a combination of standard bioinformatics programs and customized Python scripts. Each assembled transcript was searched against UniProtKB database (Release 15.5, TrEMBL and Swiss-Prot at http://www.uniprot.org) resources  using the BLASTX algorithm . Using UniProt/Gene Ontology (GO) crossed tables, candidate GO assignments were predicted on the basis of best transcripts matches (E-value <10-05) to the UniProt reference sequences. Categories were assigned on the basis of the biological, functional and molecular annotations available from GO http://www.geneontology.org.
Additionally, we constructed and searched (using the BLASTX algorithm) two other fungal sequence databases. The BasidiomycotaDB database included Basidiomycota (excluding Pucciniales) protein sequences from the non-redundant database of NCBI (124,751 sequences), a 6-frame translation of Basidiomycota (excluding Pucciniales) EST sequences from NCBI (287,259 sequences), and gene models from eight genome projects: Coprinus cinereus Okayama 7 (#130) (Broad Institute); Cryptococcus neoformans var. grubii serotype A, strain H99 (Broad Institute); Laccaria bicolor S238N-H82 (JGI); Malassezia globosa CBS 7966 (Procter and Gamble Co.); Phanerochaete chrysosporium RP78 (JGI); Postia placenta MAD-698 (JGI); Sporobolomyces roseus IAM 13481 (JGI); and Ustilago maydis 521 (Broad Institute) (a total of 85,025 gene models). The PuccinialesDB database included Pucciniales protein sequences from the non-redundant database of NCBI (390 sequences), a 6-frame translation of Pucciniales EST sequences from NCBI (84,006 sequences), and gene models from P. graminis f. sp. tritici CRL 75-36-700-3 (20,567 gene models). The hmmpfam program (HMMer software; http://hmmer.janelia.org was used to search the PFAM HMM profile database of protein domains .
In silico predictions of secreted proteins were carried out using a combination of SignalP 3.0, TargetP 1.1 and TMHMM 2.0 [44, 78, 79]. The SignalP algorithms incorporate a cleavage site and signal peptide prediction based on artificial neural networks (NN) and hidden Markov models (HMM). In order to support the SignalP results and exclude proteins with either mitochondrial targeting peptides or transmembrane domains, protein sequences were also entered into different prediction servers. TargetP is a neural networks server that predicts the subcellular localization of eukaryotic proteins based on the presence of any of the N-terminal presequences, either chloroplast transit peptide (for plant predictions), mitochondrial targeting peptide or secretory pathway signal peptide, while TMHMM uses hidden Markov models for the prediction of transmembrane helices. Following predictions, the output files were manipulated to select signal peptides containing sequences using the following criteria: (1) positive SignalP-HMM Sprob score, (2) positive SignalP-NN Smax and D scores, (3) TargetP signal peptide prediction, and (4) no transmembrane domains. SignalP-HMM Sprob score was selected because of its ability to discriminate between N-terminal signal peptides and N-terminal signal anchors, while SignalP-NN Smax and D scores are the most accurate single scores . Furthermore, because TMHMM may not distinguish signal peptides from transmembrane domains, deduced proteins with a single transmembrane domain within 40 amino acids of the N-terminus were also considered as potential secreted proteins.
The SAGE method was used as initially described in Velculescu et al. [81, 82] at the CHUQ Research Centre (CRCHUQ) SAGE Platform, Quebec City, QC, Canada http://www.crchuq.ulaval.ca/plateformes/gpb. Fungal materials from M. larici-populina strain Mlp Berth. 3729 were harvested from both susceptible (P. × jackii) and resistant (P. trichocarpa × P. deltoides 'Boelare') hosts at two different time points (22 hours and 5 days) after inoculation. Additionally, germinating urediniospores and germ tubes were collected 2 hours after inoculation on the susceptible cultivar by painting leaves with 5% cellulose acetate (dissolved in acetone), letting the acetone evaporate, and stripping the cellulose acetate film off the leaves. For each of the five treatments, 50 micrograms total RNA were extracted using the RNeasy mini kit (Qiagen, Valencia, CA, USA) and poly(A) RNA isolated with the mRNA Mini kit (Qiagen), annealed with the biotin-50-T18-30 primer, and converted into cDNA using the cDNA synthesis kit (Invitrogen, Carlsbad, CA, USA). The resulting cDNA were digested with Nla III (anchoring enzyme), and the 3' restriction fragments were isolated with streptavidin-coated magnetic beads (Dynal Biotech, Oslo, Norway) before being separated into two populations. Each population was ligated to one of the two annealed linker pairs and washed to remove unligated linkers. The tag beside the most 3' NIa III restriction site (CATG) of each transcript was released by digestion with Bsm FI (tagging enzyme). The blunting kit from Takara Co. (Otsu, Japan) was used for the blunting and ligation of the two tag populations. The resulting ligation products containing the ditags were amplified by PCR with an initial denaturation step of 1 min at 95°C, followed by 22 cycles of 20 sec at 94°C, 20 sec at 60°C and 20 sec at 72°C with 27 bp primers. The PCR products were then digested with Nla III and the band containing the ditags was extracted from 12% acrylamide gel. The purified ditags were self-ligated to form concatemers of 500-1800 bp isolated by agarose gel. The resulting DNA fragments were ligated into the Sph I site of pUC19 and cloned into UltraMAX DH5aFT E. coli cells (Invitrogen, Carlsbad, CA, USA). White colonies were screened by PCR to select long inserts for automated sequencing as previously described for cDNA libraries.
Sequence files were analyzed using the SAGEparser program . Tags corresponding to linker sequences were discarded and duplicate concatemers were counted only once. To identify the transcripts, the sequences of 15 bp SAGE tags (Nla III site CATG plus adjacent 11 bp tags) were matched with a collection of unassembled poplar and M. larici-populina ESTs with polyA tail using a customized Python script. To avoid the possibility of sequencing errors in the EST database, the matches that were identified only once among the EST database were not considered.
Homologous gene groups (HGGs) were identified from reciprocal TBLASTX searches (E-value < 1e-30) between libraries and P. graminis f. sp. tritici gene models followed by graph clustering analysis using a TCL implementation of the Deep-First Search algorithm. Each HGG included at least two sequences from Melampsora unisequences and a minimum of three total sequences. HGGs were either removed or divided when overlapping regions where too short or when similarity was not found throughout the majority of the unisequence coding sequences. Furthermore, sequences with gaps across the aligned coding sequences were removed in order to minimize the impact of the pitfalls of positive selection analyses, such as gap-induced misalignments and relaxed selection in pseudogenes. The resulting 369 HGGs were then submitted to positive selection analyses using a suite of program grouped into a single Python script. The protein sequences in each HGG were first aligned with ClustalW  and the corresponding coding DNA sequences were automatically extracted. A Neighbor-joining phenetic tree based on distance matrix between nucleotidic sequences was then reconstructed for each HGG and used as starting tree for Bayesian inference and Markov Chain Monte Carlo simulations (B/MCMC) (only possible with HGG of 4 sequences and more; the Neighbor-joining tree reconstructed with PAUP* was used for HGG of 3 sequences). Prior to the B/MCMC, the models for nucleotide substitutions were selected using the hierarchical likelihood ratio test (hLRT) implemented in the Modeltest 3.7 program. Adaptive evolution was first estimated by pairwise calculation of the rates of nonsynonymous nucleotide substitutions per nonsynonymous site (d N) and the rates of synonymous nucleotide substitutions per synonymous site (d S) between all members of an HGG. Additionally, as adaptive evolution is likely to act on a small subset of amino acid residues and hence averages of substitution rates across the gene may not strictly indicate positive selection, HGGs were scanned for adaptive evolution using codon-based substitution models that allow ω to vary among sites, with the parameters of the model estimated using maximum likelihood. These analyses were conducted using the codeml application from the PAML package version 4 . Bayesian inference of phylogeny aimed at estimating the posterior probabilities and branch length of phylogenetic trees as starting values for codeml maximum likelihood iteration under codon model M0 to get fixed branch lengths. The resulting ω pairwise calculations are shown in Table 3 for all HGGs with at least one pairwise calculation showing ω > 1.2. We also contrasted the codon substitution models M1A (neutral), M2A (selection), M7 (beta), M8 (beta and ω) and M8A (beta and ω = 1; ). The model M1A assumes two site classes in proportions p0 and p1 = 1-p0 with 0 < ω 0 < 1 (conserved) and ω 1 = 1 (neutral). M2A adds an additional class of site with ω 2 as a free parameter, allowing for sites with ω 2 > 1 with proportion p2. Model M7 uses a beta distribution of sites within the interval 0 < ω < 1. M8 adds an extra class of sites to the M7 model, allowing for positively selected sites with ω > 1, while this extra d N/d S category is restricted to one in model M8A . From these models, statistical significance was tested using likelihood ratio tests by comparing the null models M1A, M7 and M8A with the alternative M2A, M8 and M8 models, respectively. Models M2A and M8 are tests of positive selection among codon sites and were implemented with at least three different starting ω values (0.2, 1.0 and 2.0). Twice the difference in log likelihood ratio between null and alternative models was compared with a χ2 distribution with two degrees of freedom. The likelihood ratio tests assess whether the alternative models fits the data better than the null model and is known to be conservative in simulation tests. For HGGs that tested positive using the ML method, posterior Bayesian probabilities of site classes were inferred for each amino acid site by using the Bayes empirical Bayes method .
This work was supported by the Genomics Research Initiative of Natural Resources Canada to RCH. The authors would like to thank the Broad Institute and the DOE-JGI for releasing the data of the fungal genome sequencing projects. Furthermore, the authors thank Brian Boyle for technical assistance and Josyanne Lamarche, Sébastien Duplessis and Francis Martin for helpful comments on the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.