Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags

Background Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum. Results We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes. Conclusion The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms.


Background
Dermatophytes, consisting of organisms in the Trichophyton, Epidermophyton, and Microsporum genera, are the primary etiologic pathogens of various dermatophytoses, such as tinea capitis, tinea corporis, tinea inguinalis, tinea manus, tinea unguium and tinea pedis. These infections are widespread and increasing in prevalence on a global scale. Indeed, in some geographic regions, dermatophyte infection is now considered a major public health concern.
Unlike other fungi, dermatophytes can cause infections in healthy, immune-competent individuals. Estimates suggest that 30 to 70% of adults are asymptomatic carriers of these fungi [1]. T. rubrum is the most common superficial fungus, accounting for at least 60% of all superficial fungal infections in humans. This organism may remain viable in the environment for over six months, thus accounting for widespread infections. Transmission occurs most often from person to person [2], for example, by shedding of infected skin cells and hair and by direct body contact [3].
Diagnosis of dermatophyte infections relies on clinical presentation, requiring successful isolation and culture and microscopy. It will take 2 to 4 weeks to culture and pleomorphic growth can lead to misidentification. Sometimes the morphological phenotype is not very stable. Rapid diagnostic tests using current molecular methodologies have been slow to develop for the dermatophytes [4].
Furthermore, despite the availability of new systemic antifungal therapies, nail infections are particularly difficult to eradicate, presenting a 25 to 40% recurrence rate [5]. Another increasing problem in clinical treatments is growing resistance to antifungal drugs. In the past decade, more and more cases of azole-and terbinafine-resistant fungal infections have been reported [6][7][8]. Ryder et al [9,10] characterized at the molecular level the cause of the resistant phenotype of some clinical isolates, considering the resistance in some T. rubrum clinical isolates to terbinafine appears to be due to a single amino acid substitution in the gene squalene epoxidase.
T. rubrum is anthropophilic, specialized to infect humans, but rare animal infections have been reported. It is urease negative and vitamins independent. On the Bromcresol purple-(BCP)-milk solids glucose media, it shows restricted growth, no indicator color change for 10 to 14 days, then profuse growth with indicator change to purple (alkaline). Its teleomorph has not yet been found. Arthroconidia and mycelial fragments are the infectious agents. Excreted extracellular proteinase, alkaline serine proteinase, plays a role in dermatophytes growth and multiplica-tion as well as in the inflammatory reaction [2,[11][12][13]. Thus, they are considered to be critical virulence factors for dermatophytes. There are many reports of the isolation and characterization of one or two proteases from an individual species of dermatophyte [14][15][16][17][18]. In T. rubrum, Some keratinases have been isolated and a subtilisin gene family were identified at the genetic level [19][20][21][22]. Jousson et al isolated a five-member secreted metalloproteases (MEP) family from genomic libraries of T. rubrum, T. mentagrophytes and M. canis. Further phylogenetic analysis revealed that the metalloproteases secreted by the three species in vitro are encoded by orthologous genes, strongly suggesting that the multiplication of an ancestral metalloprotease gene occurred prior to the dermatophyte species divergence [23]. However, only a small number of proteases from dermatophytes have been characterized at the genetic level.
Taken together, the lack of effective diagnostic and treatment strategies, the large number of individuals that experience dermatophyte infections, and the economic consequences highlight deficiencies in the research efforts aimed at understanding dermatophyte. T. rubrum is the most common superficial fungus, inducing dermatophytoses in various parts of the human skin, and can also cause deeper infections such as kerions, abscesses and granulomas. Research examining the pathogenesis of T. rubrum in numerous skin infections is needed in order to develop novel therapeutic agents and identify potential drug targets.
In the present study, we constructed various T. rubrum cDNA libraries and obtained 11,085 unique expressed sequence tags (ESTs). By analyzing these ESTs, we were able to further elucidate the physiological activities in metabolism, signal transduction, sexual life cycle, pathogenesis and cell wall composition of T. rubrum. These sequences provide an unparalleled resource for the future understanding of this remarkable fungus. Furthermore T. rubrum represents an attractive model species with which to study dermatophytes and other pathogenic filamentous fungi.

Results and discussion
Expressed sequence tags sequence determination EST sequences were produced from 10 different cDNA libraries. After trimming for the vector and poor quality segments, a total of 34,670 sequences were identified and 11,085 unique ESTs were isolated. Isolated ESTs included 3,816 contigs and 7,269 singletons. All the constructed libraries, culture conditions and sequencing statistics employed in this study are summarized in table 1. The average sequence length of these unique ESTs was 586 nucleotides (nt). Approximately 8,639 (78%) of the unique ESTs were longer than 400 nt.
In the synthesis of the cDNA first strand, incomplete cDNA synthesis and/or 5' truncation of mRNA transcripts could lead to an overestimation of the gene number identified from our analysis. Accuracy of the estimation absolutely depends upon the quality of the library. Among the 7,764 matched ESTs, 1,439 could be matched to reference genes if the subjected starting site was set to be the first coding amino acid; meanwhile 2,325 matching genes could be found if the subjected starting site was allowed to be any of the first 10 amino acids. These data confirm that the quality of the libraries is reliable.

Comparison to Non-redundant and Clusters of Orthologous Groups databases
Identified ESTs were compared with the GenBank nonredundant (NR) protein database. This comparison enabled us to assign putative functions to or find homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) ESTs were only weakly similar or not similar to known sequences (E ≥ 1E-05), suggesting that they represent novel genes.
The comparative results for the ESTs with respect to the Clusters of Orthologous Groups (COGs) database, and its eukaryotic counterpart termed the KOGs database, are summarized in table 2. The ESTs identified herein contained a broad range of genes, predominantly encoding putative proteins involved in primary metabolism, gene expression, post-translation processes and cell structure. A significant proportion of the identified ESTs were matched to genes involved in transcription and signal transduction, suggesting that T. rubrum has an elaborate regulation system.
The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms, including 3 Eukaryota, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalito-zoon cuniculi. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products [24]. Because the RNA processing in eukaryotic cells is more elaborated, and eukaryotic cells have some structures that prokaryotic cells lack, for example, nuclear and mitochondria, more genes involved in RNA processing and modification [A], chromatin structure and dynamics [B], cell cycle control, cell division, chromosome partitioning [D], nuclear structure [Y], cytoskeleton [Z] and intracellular trafficking, secretion, and vesicular transport [U] were identified in KOGs than COGs. The difference between gene number of class energy production and conversion [C] was mainly due to the mitochondrial related genes. On the contrary, more genes involved in cell motility [N], mainly encoding secretory pathway and cell motivity related proteins, were identified in COGs than KOGs. Meanwhile, the COG/KOG result was also affected by the subtle difference in functional classification to function similar proteins, especially transporters.  IV, leucine aminopeptidase 1, leucine aminopeptidase 2 and a putative secreted metalloprotease 4), suggesting that the condition in their research may be a more suitable environment to induce expression of these proteins. The proteinases subtilisin-like protease SUB1 and subtilisinlike protease SUB5 were expressed not only in the condition they mimicked but also in YPD media (see Additional file 1). By comparison with previous released ESTs, our data supply the information about gene expression in different conditions, advancing the current knowledge of T. rubrum transcriptome.

Metabolism and secondary metabolism
Metabolic overview A large percentage of T. rubrum genes were annotated in glycolysis and oxidative phosphorylation systems, as expected from what is known from other aerobic filamentous fungi. Genes corresponding to the citrate cycle enzymes and to components of complexes I (NADH-CoQ reductase), II (Succinate-CoQ reductase), III (CoQ-Cytochrome C reductase), and IV (Cytochrome C oxidase) were identified. The presence of these genes reflects the fungus' ability to perform complete aerobic pyruvate degradation and oxidative phosphorylation.
The pathways involved in metabolizing mono-and disaccharides, such as glucose, fructose, mannose, and sucrose, as well as polysaccharide starch, were best represented in the identified ESTs. There were also a large number of membrane transporters for saccharides such as xylose and fucose. Among the identified ESTs, all aminoyl-tRNA synthases have been previously described with the exception of cyseine-tRNA synthase, of which only one contig was found to exhibit weak homology (C3495-Contig1, E value 0.00008). Comparison of our ESTs data to KEGG revealed that these proteins were probably involved in many amino acid metabolism pathways, including glutamate metabolism, alanine and aspartate metabolism, glycine, serine and threonine metabolism, methionine metabolism, cysteine metabolism, valine, leucine and isoleucine biosynthesis, lysine biosynthesis, arginine and proline metabolism, histidine metabolism, phenylalanine, tyrosine and tryptophan biosynthesis. Analysis of these pathways indicated that T. rubrum could itself synthesize lysine. It could synthesize some amino acids via amino acid synthetase including glutamate, glutamine, asparagine, cysteine, tryptophan and threonine, and it can also synthesize some other amino acids via amino transfer reactions. The primary lipids in T. rubrum are sterol and phospholipid. Our analysis of the metabolic pathways in T. rubrum revealed that it can also synthesize co-enzymes such as riboflavin, nicotinate, nicotinamide, coenzyme A, ubiquinone, and folate. The genes probably involved in the synthesis of thiamine included: NMT1 protein, THI4 protein, ThiC protein and a putative ThiG protein.
Because T. rubrum's vitamin independence serves to separate it from species such as Trichophyton violaceum that have vitamin requirements, identification of the ESTs involved in vitamin synthesis is of great significance for molecular diagnosis. Some ESTs probably involved in metabolism are included in additional files (see Additional file 2).

Secondary metabolism
Secondary metabolites are a remarkably diverse class of cellular products that often exhibit taxonomic specificity. Secondary metabolites are generally considered "nones-sential" for organismal growth in culture. In addition to the metabolic pathways mentioned above, there are also several secondary metabolic pathways present in T. rubrum. It has long been known that dermatophytes can survive in toxic environments, presumably through the production of biological tools to resist these toxins. Indeed dermatophytes have even been isolated from sewer water and polluted soil containing organic compounds such as aromatic compounds. The pathways implicated by the EST analysis are consistent with T. rubrum's capability to enzymatically degrade various toxins such as phenanthrene, dibenzofuran, ethylbenzene, styrene, fluorene, and 1,1,1-Trichloro-2,2-Bis-(4'-Chlorophenyl)Ethane(DDT) (see Additional file 3).
Similar to other pathogenic filamentous fungi, such as Magnaorthe grisea [25] and Aspergillus fumigatus [26], several important secondary metabolic products, were also found in T. rubrum; these included non-ribosomal peptide synthases, polyketide synthases, two putative dimethylallyl tryptophan synthases, a putative arsenate reductase and a hydrophobin. These gene products may be related to T. rubrum growth and pathogenicity. Polyketides (derived from polyketones) are a class of secondary metabolites produced by most organisms, but they have been most extensively examined in bacteria and fungi. In fungi, numerous functions have been proposed for polyketides, including the production of toxins [27,28] and spore pigments [29,30]. Although it is well known that T. rubrum can produce pigments, the relationship between secondary metabolic pathways and pigments production remains unresolved. The various identified ESTs involved in secondary metabolism are listed in additional files (see Additional file 3).
Interestingly, T. rubrum harbors a putative sterigmatocystin biosynthesis monooxygenase StcW (C1113-Contig1), and a probable sterigmatocystin biosynthesis P450 monooxygenase STCL (Cytochrome P450 60B, EST000637, [GenBank:DW406216]), suggesting that it very likely may produce sterigmatocystin. However, confirmation of this possibility will require further investigation. Sterigmatocystin is of particular significance in evaluating the toxicity of T. rubrum products because prior evidence indicates that it is probably toxic to the human liver.
Although it is suspected that Trichophyton mentagrophytes can produce a penicillin-like substance [31], we did not find any evidence suggesting that T. rubrum generates antibiotic-like by-products.

Signal transduction
We also identified a variety of signal transduction systems in T. rubrum, such as MAPK, cAMP-dependent pathways, G-protein pathways, Ras pathways, and a large number of serine/threonine protein kinase/phosphatases and signal histidine kinases. Although the first four systems are generally conserved among fungal and mammalian species, the numbers and functions of signal transduction of histidine kinases vary between fungi. For example, in S. cerevisiae there exists only one, but in Neurospora crassa [32] there are 11 different histidine kinases. Indeed, there are many differences in the numbers and types of histidine kinases between filamentous fungi and yeast. Some experimental results [33][34][35][36] obtained with Candida albicans and Aspergillus fumigatus suggest that histidine kinases are related to fungal pathogenesis. In T. rubrum many ESTs are homologous to histidine kinases, some of which may be related to osmotic and nutrient responses. Several identified histidine kinases contain PAS/PAC domains, suggesting that they are involved in oxygen and light responses.
In addition to histidine kinase-related ESTs, we also identified Calcium-Calmodulin homologues and related proteins, suggesting that Ca-CaM pathways may be utilized by T. rubrum, similar to the model organism Neurospora crassa. Commonly, Ca 2+ release from internal stores is mediated by the second messengers inositol-1, 4, 5-trisphosphate (InsP3) and cADP ribose, or by Ca 2+ -induced Ca 2+ release [37]. Although InsP3 is present within Neurospora hyphae [38] and T. rubrum, Neurospora lack recognizable InsP3 receptors, ADP ribosyl cyclase and ryanodine receptor proteins, which are principal components of Ca 2+ release mechanisms in plant and animal cells. These observations raise the question of whether there may be other second messenger systems that are responsible for Ca 2+ release from internal stores that remain to be discovered in filamentous fungi. InsP3 receptors, ADP ribosyl cyclase or ryanodine receptor proteins were also not found in our EST data (the existence of these genes can only be determined after the whole genome is sequenced). Thus the mechanism of T. rubrum Ca-CaM pathways remains to be determined. The various identified ESTs involved in signal transduction are listed in additional files (see Additional file 4).

Sexual and asexual development
Although the sexual life cycle has been described for Trichophyton mentagrophytes, Trichophyton ajelloi and Microsporum canis [2,39], which are all heterothallic species, the sexual cycle of T. rubrum remains to be elucidated. We compared sexual-cycle related genes with those in other fungi, and our findings suggest that T. rubrum may be capable of sexual reproduction. , and a large number of meiosis-related genes were identified in the ESTs, many of which were also found in Aspergillus niduluns and/or A. fumigatus. An analysis of 215 genes implicated in the fungal mating process, pheromone response, meiosis and fruiting body development revealed that many genes present in A. nidulans [26,40,41] are also present in T. rubrum, including several genes for which the only known function is related to sexual reproduction (see Additional file 5). These results suggest that T. rubrum may also possess sexual cycles. However more in-depth exploration will be required in order to determine whether they indeed possess sexual cycles.
The current research on the fungal asexual development cycle primarily involves studies of Neurospora crassa and Aspergillus niduluns, and the literature contains two distinct models. Our comparison of the identified T. rubrum ESTs with the NR protein database, revealed a key enzyme (FlbD, EST023258, [GenBank:DW699777], E value 5E-52) that is present in A. niduluns [41], but no key enzymes present in Neurospora crassa. Because in A. nidulans four proteins (FlbC, BrlA, AbaA and WetA) besides FlbD are also required in macroconidiation pathway, it is still unclear whether T. rubrum is more similar to A. niduluns or N. crassa with regards to asexual development.

Extracellular proteinases
The most evident dermatophyte feature is the ability to digest keratin. Dermatophytes can degrade human and other animal keratin protein and utilize it. This represents the pathogenic feature that differentiates dermatophytes from other fungi. The secreted proteinases of dermatophytes play an important role in the process of infection and are thus considered the primary virulent factors. However, only a few proteinase sequences have been identified thus far in T. rubrum [19][20][21][22][42][43][44][45]. In our data, secreted proteins were firstly analyzed by the NR comparison result. To the putative secreted protein identified in NR database, all of the corresponding ESTs were further subjected to SignalP prediction analysis. As to the ESTs listed in the Additional file 6, 48 of them were predicted to contain a signal peptide. Among the T. rubrum ESTs, identified secretory systems include type I, II, III and V, with type II, III being the most heavily represented. We also identified a large number of putative secretion related pathways and putative secreted proteinase, including some serine proteinases, aspartic protease, alkaline proteinase, peptidases and metalloprotease. (see Additional file 6). Among the serine proteinases, 10 unique ESTs were homologous to the known dermatophyte subtilisin-like serine protease family members SUB1, SUB5, and SUB6 [22]; meanwhile no identified ESTs were homologous to SUB2, SUB3, SUB4, or SUB7. The identity of the 6 serine protease ESTs with highest homology to T. rubrum serine proteinases varied from 80% to 93%, suggesting that the genes may be selectively spliced, and/or the family may have other members, and/or that these genes have many copies in the genome.
Secreted metalloproteases are thought to be associated with lesion extension [15]. Recent reports [46] have identified a nitrogen regulating factor response region located upstream of the Microsporum canis MEP1 gene, consistent with the finding that, at the beginning of infection, there are only a few urea molecules, amino acids and glucose molecules available as a nutrient resource in sweat. Aspartic protease is an acid proteinase whereas secreted metal proteinase and subfamily members are neutral or basic proteinases. The presence of these proteinases supports the finding that initially T. rubrum can grow in either acidic or basic pH environments.
These proteinase and peptidase findings suggest that they may also be related to amino acid transportation. During T. rubrum infection, extracellular proteins are hydrolyzed into peptides by secreted proteinases then are further degraded into amino acids or dipeptides by peptidases and finally transported into cells. The identification of these proteinases and peptidases increases our understanding of the pathogenic mechanism underlying T. rubrum infection.
These secreted proteinases can not only degrade proteins such as keratin, elastin and collagen to supply nutrients to the fungi, but can also induce delayed-type hypersensitivity (such as with the SUB family and Tri r4) [1,22]. However, although the T. rubrum allergen Tri r4 EST shares 98% homology with that in Trichophyton mentagrophytes, it is clear that the inflammation mediated by T. rubrum is not as severe as that induced by T. mentagrophytes . This is probably due, at least in part, to the presence of multiple inflammatory factors in T. mentagrophytes. When comparing the putative virulence factors present in T. rubrum with those in A. fumigatus [26] we also observed that many known virulence factors in A. fumigatus were not present in the identified T. rubrum ESTs. Likewise, many putative T. rubrum virulence factors were not found or were divergent in A. fumigatus. For example, the mep1 gene exists in both A. fumigatus (Afu8g07080) and dermatophytes, but the two sequences were highly divergent (E value 5e-26).

Cell wall
The cell wall is a structure that humans lack but that fungi have. Therefore, the cell wall represents an ideal target for novel anti-fungal drugs. Many cell wall-related proteins were found among the presently identified ESTs, including chitin synthesis, chitinase, β1,3-glucan synthase, β1,6glucan synthase, and 1,4-alpha-glucan branching enzyme. By analyzing the pathways involved in cell wall synthesis, it can be inferred that T. rubrum is probably capable of synthesizing peptidoglycan, the primary component of gram-positive bacteria. The principle enzymes involved in the peptidoglycan synthesis pathway UDP-N-acetylmuramoylalanyl-D-glutamyl-2, 6-diaminopimelate-Dalanyl-D-alanine ligase (EST020893, [Gen-Bank:DW697412], GO:0008766) and phospho-N-acetylmuramoyl-pentapeptide-transferase (EST009048, [GenBank:DW685567], EC:2.7.8.13) were identified in our EST data. But it is still to be determined by experimental confirmation. We also identified some putative proteins that are likely to be involved in sterol synthesis, which is a primary target for clinically-available drugs. The components of the identified ESTs that are involved in cell wall synthesis are listed in table 3.

Conclusion
The estimated T. rubrum genome size is at least 22.05 Mb [47]. And we estimate that a significant proportion of the gene content of T. rubrum is represented in this collection of sequences. Although ESTs can only represent genes that are actively expressed in particular phases, they complement the value of genomic sequencing through the functional identification of novel genes and provide information about gene structure and expression patterns [48][49][50][51][52][53]. Thirty percent of the 11,085 contigs and singletons identified in our analysis represent unique genes. Our Blast searches revealed that 33% of the 11,085 unique sequences possess matches in the yeast genome (<e-5), suggesting that there are clear distinctions (~67%) between yeast and filamentous fungi. These findings highlight the need for additional research on filamentous fungi.
In summary, although superficial fungi are the primary human pathogenic fungi, our knowledge about these organisms remains limited. T. rubrum represents an ideal model for the study of superficial fungi; our work in identifying ESTs in T. rubrum cDNA libraries will facilitate a greater understanding of the molecular mechanisms underlying its growth, metabolism, pathogenesis and drug resistance. In addition, our work may aid in the identification of novel effective drug targets and anti-fungal agents.

Preparation of spore cultures
A few hyphae of T. rubrum were inoculated on potato glucose agar and incubated at 28°C for 2~3 weeks. The spores were then washed by liquid YPG medium and filtered by cell filter to get rid of the hyphae. Part of the spores obtained were incubated in YPG medium at 28°C for 12 hours. Finally we got two samples, spores and incubated spores, respectively.

Isolation and purification of total RNA and mRNA
Total RNA and mRNA were isolated and purified as described previously [54].

Construction of the cDNA library
Construction of the unnormalized cDNA library The 9 cDNA libraries (7 representing mycelium and 2 representing spores) were constructed following the protocols of the SUPERSCRIPT™ Plasmid System with GATEWAY™ Technology for cDNA Synthesis and Cloning (Invitrogen).

Construction of the subtracted cDNA library
The subtracted library was constructed following the protocols of the Clontech PCR-Selected cDNA Subtraction Kit. According to the instruction of the Kit, driver cDNA will be subtracted from the tester cDNA therefore a normalized library is constructed.

Isolation and purification of plasmids, and sequencing of clones
The cDNA plasmids were isolated as described previously [54]. Sequencing was performed with a generic T7 primer located 5' upstream of the inserted segments, following the protocol of the PRISM Big Dye Terminator Kit on an ABI3700 or MEGABACE automated sequencer.

Bioinformatic analysis
Sequence data processing, EST clustering and re-assembly The processing method has been described previously [54].

Authors' contributions
LW performed construction of the cDNA gene libraries, clone isolation, plasmids sequencing, data analysis and drafted the manuscript. LM, WL performed construction of the cDNA gene libraries, clone isolation, and plasmids sequencing. TL, LY participated in the construction of the cDNA libraries and the isolation of clones. JY carried out the bioinformatics analysis and constructed the T. rubrum database. LY, WZ, QZ, JD, YX, YZ and XX participated in the library construction, clone isolation and plasmids sequencing. ZW and RL were responsible for the strain identification, culture and growth conditions design. GD, FY, KT and YL participated in bioinformatics analysis. QJ designed the project, supervised the research and revised the manuscript. SY cooperated with QJ and implemented the project and supervise the research in the center of Chinese National Human Genome Center. All authors read and approved the final manuscript.

Additional material
Publish with Bio Med Central and every scientist can read your work free of charge