A cDNA library can provide molecular resources for analysis of genes involved in the biology of a plant pathogenic fungus, such as genes responsible for the development, survival, pathogeniCity and virulence. In order to initiate studies on the basic genome structure and gene expression of P. striiformis with infective State, we constructed a full-length cDNA library and a BAC library from urediniospores of a predominant race of P. striiformis f. sp. tritici . The full-length cDNA library can be used to study the normal transcription profiles for the uredinial State, the biologically and epidemiologically essential stage of the fungus. The current cDNA library will serve as a major genetic resource for identifying and isolating full-length genes and functional units from the P. striiformis genome. Because this cDNA library was constructed from urediniospores of the pathogen, it should include expressed genes unique to this spore stage. Therefore, the cDNA library should have avoided EST limitations that are commonly generated by automatic assemblies of transcripts from different tissues. Controlled greenhouse conditions and careful handling of the plants and spores minimized possibility of contaminations by other fungal spores. Powdery mildew or leaf rust, which sometimes contaminates stripe rust spores, were not observed on the stripe rust - sporulating plants. Therefore, genes or cDNA sequences identified in this study should be from urediniospores of P. striiformis f. sp. tritici. This also was confirmed in a separate study, in which primers of all 12 randomly picked cDNA clones were successfully amplified clones in the BAC library constructed with the same race of the pathogen (data not shown).
A urediniospore of P. striiformis is an infectious structure that is critical for the rust to initiate the infection process. Although the fungus produces other spores, teliospores and basidiospores, they do not result in infection of host plants because the fungus does not have alternate hosts for basidiospores to infect. Compared to mycelium, a urediniospore is relatively more resistant to adverse environmental conditions. Therefore, the urediniospore stage should contain most of the pathogen genes involved in the pathogen development, survival and pathogeniCity. Thus, our first full-length cDNA library for P. striiformis was constructed using urediniospores. Such transcript (gene) collection should include the genes that are important for the unique physical properties and characters of the urediniospores of P. striiformis. These genes are essential to maintain their germination and infective abilities. Therefore, the current full-length cDNA library would be one of the useful genomic resources for the functional genomic study of this important agricultural pathogen. Our full-length cDNA library reported here is the first large scale transcript collection for P. striiformis. As expression of certain genes are stage-specific and genes involved in plant-pathogen interactions express in haustoria [4, 13], currently, we are working together with Scot Hulbert's lab to construct a full-length cDNA library from haustoria of the same stripe rust race used in this study.
The technology used in this study for full-length cDNA enrichment is robust and only requires less than 1 μg of starting total RNA. By using the MMLV reverse transcriptase, only the 5'-end tagged cDNAs are not prematurely terminated and can be amplified into full-length by an RNA oligo-specific primer [35, 37]. The size fractionation process was modified in this study to generate large directional full-length cDNA inserts, which enriched full-length cDNA clones to have an insert size up to 9 kb. The enrichment of the full-length cDNA was achieved by PCR amplification following the cDNA synthesis. Because selection bias could favor the smaller cDNA, we used fewer PCR cycles to minimize such bias as previously suggested . The conventionally constructed cDNA libraries rarely carry cDNA inserts over 2 kb, because the longer transcripts are often easily truncated during cDNA synthesis process, causing size bias against the larger cDNA fragments in cloning process. In our study, up to 22 PCR amplification cycles were used to generate adequate amount cDNA for cloning. The evaluation of cDNA insert size and its distribution showed a low level of insert size bias in the final cDNA library. Most of the cDNA inserts ranged from 500 bp to 1,500 bp, and there were high number of cDNA clones harboring inserts over 3,000 bp. Such results indicate that the size fraction is an effective selection approach to ensure the full-length cDNA content level in the cDNA library. The high quality of the initial total RNA and the optimal LD PCR conditions also resulted in low size bias level for the insert size distribution in this library. High quality and adequate amount of the initial mRNA is the key for yielding sufficient amount of the first strand full-length cDNA by reverse transcription. To reduce the redundancy and to avoid underrepresentation of different transcript species, cDNA fragments with different fractionated sizes were balanced and subjected to library construction. A considerable number of clones with an insert over 3 kb were found in our cDNA library, such big insert size is rarely found in conventional cDNA libraries.
The sequences of 5'-end transcripts are important for finding the signals for initiation of transcription. Irrespective of the length of cDNA, identification of the specific 5'-end nucleotide sequences in cDNA is commonly used to determine the full-length cDNA content and quality. In many cases, the 5'-end nucleotide sequences are referred to as a 5' cap structure [3, 15, 20, 27]. We also found that nearly 95% of the cDNA clones contained the known 5'-end sequence : 5'-CGGCCGGG-3' (DB Clontech. USA), where as (G)3 at 3'-end will bind to the intact reveres transcripts which has nucleotide priming site CCC at its 5'-end. Completed ORFs were identified in cDNA sequence having the 5'-end sequence structure (5'-CGGCCGGG-3'). Presence of the ATG initiation codon aligned with amino acid methionine also was used as an indicator for the quality of full-length cDNA.
Blastx was used to search the entire NCBI GenBank with e-value of 10-5, which revealed 37% of the cDNA clones with high homologies to genes with known functions in the database. The relative low match rate to homologous genes from the blastx search might be due to the lack of gene information in the database for fungi. During the search process, the longest ORFs in each given cDNA sequence was also evaluated with amino acid alignments. The results showed that 86% of the cDNA clones contain ORFs with the translation initiation codon and stop codon. In addition, the existence of multi-exonic structure within some ORFs is additional evidence that supports their biological reality of genes or transcripts. The Kozak rules were found not totally applicable in determining ORFs in this study. Perhaps the Kozak rules are more suitable for analysis of mammalian genomes .
So far, there have been no other reports on the genome of P. striiformis in relation to function and biology of this important pathogen. In this study, we have identified genes encoding 51 different protein products involved in eleven aspects of the pathogen cell biology and plant infection. These genes are the first group of genes reported for the stripe rust pathogen. The genes identified for virulence/infection can be used in transient expression to confirm their function in pathogeniCity. Although we sequenced only a small portion of the cDNA library, the study demonstrated the high efficiency of this procedure for the identification of putative genes of known function. As more and more genes with identified functions from other organisms are deposited into the databases, genes with important functions in P. striiformis should be more efficiently identified using our cDNA library. Even though sequences of only 196 clones were characterized in this study, we identified 19 cDNA clones encoding ribosomal RNA subunits, seven clones encoding deacetylase, and two clones encoding the glucose-repressible protein. The results may indicate the mRNA abundance of these genes. In this study, 10 cDNA clones had one of the two partial sequences with high homology (e-value ranging from 3E-06 to 5E-77) to genes identified in other fungi, but another partial sequence produced no hit. The results may indicate that these genes have very long sequences, and also may reflect that similar gene sequences in other fungi are mainly short EST sequences. When blastx search was conducted using other fungal genomic databases , seven cDNA clones, which produced no hit when blasted with the NCBI database, were identified to have some homology with unknown functions in various fungal species. In this study, we identified 37.2% of the clones with known genes, 18.4% encoding hypothetical proteins, and 25.5% no hit. These numbers are quite different from the 11%, 23%, and 66% of these categories, respectively, found in the urediniospore EST library of P. graminis f. sp. tritici, the wheat stem rust pathogen (L. Szabo, personal communication). The differences could be due to the clone sampling sizes of the studies and the different types of libraries (the full-length cDNA library for P. striiformis f. sp. tritici and conventional EST library for P. graminis f. sp. tritici). As more genes or ESTs from other Puccinia species infecting cereal crops become available, it will be more feasible to identify genes common to this group of the rust pathogens and also identify genes unique to particular species.