Transcriptome analysis of the honey bee fungal pathogen, Ascosphaera apis: implications for host pathogenesis

Background We present a comprehensive transcriptome analysis of the fungus Ascosphaera apis, an economically important pathogen of the Western honey bee (Apis mellifera) that causes chalkbrood disease. Our goals were to further annotate the A. apis reference genome and to identify genes that are candidates for being differentially expressed during host infection versus axenic culture. Results We compared A. apis transcriptome sequence from mycelia grown on liquid or solid media with that dissected from host-infected tissue. 454 pyrosequencing provided 252 Mb of filtered sequence reads from both culture types that were assembled into 10,087 contigs. Transcript contigs, protein sequences from multiple fungal species, and ab initio gene predictions were included as evidence sources in the Maker gene prediction pipeline, resulting in 6,992 consensus gene models. A phylogeny based on 12 of these protein-coding loci further supported the taxonomic placement of Ascosphaera as sister to the core Onygenales. Several common protein domains were less abundant in A. apis compared with related ascomycete genomes, particularly cytochrome p450 and protein kinase domains. A novel gene family was identified that has expanded in some ascomycete lineages, but not others. We manually annotated genes with homologs in other fungal genomes that have known relevance to fungal virulence and life history. Functional categories of interest included genes involved in mating-type specification, intracellular signal transduction, and stress response. Computational and manual annotations have been made publicly available on the Bee Pests and Pathogens website. Conclusions This comprehensive transcriptome analysis substantially enhances our understanding of the A. apis genome and its expression during infection of honey bee larvae. It also provides resources for future molecular studies of chalkbrood disease and ultimately improved disease management.


Background
Chalkbrood disease of the honey bee is caused by Ascosphaera apis (Maassen ex Claussen), [1] a filamentous fungus that exclusively infects bee larvae [2]. Adult honey bees are not susceptible to infection but can disperse spores, particularly via food sharing [3]. After ingestion, fungal spores germinate in the larval gut, mycelia cross the gut lining and proliferate through the body cavity, invading all internal organs until the larva dies of systemic mycosis. The fungus then continues development and nutrient acquisition inside the cadaver. Mycelia eventually emerge from the body cavity and form masses of ascomata on the outer surface. If environmental conditions allow, cadavers desiccate to form chalkbrood 'mummies' that can contain millions of ascospores [4]. In accordance with this life history, A. apis spores germinate best in nearly-anaerobic conditions, while mycelial growth and sporulation is aerobic and can be maintained in axenic culture [2].
Chalkbrood can lead to heavy losses in affected honey bee colonies, but disease severity depends on multiple factors including: age of larvae, dose of inoculums, temperature, and humidity. Chalkbrood outbreaks have most frequently been recorded in cool and humid climates, although they may also occur in hot and dry climates [5]. Currently, management of chalkbrood (reviewed by [5]) focuses on sanitation methods that target the long-lived spores, improved management of symptomatic disease, and use of disease-resistant honey bees. A wide spectrum of fungicides has been tested against A. apis, but none of them are approved for use in beehives [6]. Natural compounds with anti-fungal properties are also being investigated. Therefore, there is a continued need for cost-effective and widely applicable treatments that do not leave chemical residues in bee products. A better understanding of the mechanisms of pathogenesis would help advance this goal.
Genome-scale sequence analysis is a powerful and efficient strategy to identify genes involved in the complex interactions between host and pathogen. In this study, we have performed high-throughput sequencing to compare the A. apis transcriptome as it is expressed during axenic culture versus controlled infection of larvae. This strategy can help identify genes that may be responsive to the host environment or are otherwise important for pathogenicity, although subsequent replication is needed to validate these candidates. Our data are used to support revised gene models, to evaluate the completeness of the current genome assembly [7], and to quantify differential read counts between growth conditions. We also identify components of key molecular pathways such as signal transduction and mating type. Our results extend previous genomic analyses of A. apis [7,8], providing additional resources for investigating pathogenesis and, ultimately, identifying effective treatment strategies.

Results and discussions
Transcriptome assembly A total of 992,370 454 reads were obtained for the two barcoded samples, with a mean length of 341.7 bp. The combined assembly produced 21,744 transcript contigs grouped into 9,984 isogroups (clusters of contigs that partially overlap, but which cannot be further collapsed under the assembly parameters) as well as 103 ungrouped contigs. Of these 10,087 transcript contigs, 9,534 (94.5%) had a Mega BLAST match to the reference genome assembly (E-value < 1.0e-10). Of the remaining 553 transcript contigs, 315 (57%) had a BLASTX match to a RefSeq fungal protein (E < 1.0e-5). Few transcript contigs were unalignable due to lowcomplexity sequence: only ten consisted of more than 25% low-complexity sequence as classified by the mdust package (http://compbio.dfci.harvard.edu/tgi/software/) under default parameters. Thus, most unaligned transcript contigs appear to be from valid A. apis genes that are not represented in the current genome assembly. These transcripts were included in the read-mapping analysis below but not in the gene-prediction pipeline.

Gene prediction
Using Maker [9] to integrate ab initio predictions, transcript alignments, and protein homologs, we obtained 6,992 gene models, which is within the range of other sequenced ascomycetes [10,11]. A GBrowse [12] viewer of these models and other genome features are publicly available on the Bee Pests and Pathogens section of the Hymenoptera Genome Database [13] (http://hymenopteragenome.org/beebase/?q=bee_pathogens). We found some evidence of alternative splicing from mapped transcript contigs and expect that deeper sequence coverage or transcript-specific analyses would confirm its occurrence given results in other fungi (e.g. [14]). However, the small number of Maker-predicted alternative splice sites did not appear to be robust as a group (for example, concatenation of exons from distinct genes was a more plausible explanation than alternative splicing in some cases), such that we have conservatively removed alternative transcripts from this analysis.
To further evaluate the completeness of the A. apis genome assembly, we used the Core Eukaryotic Genes Mapping Approach (CEGMA) tool [15], with the CEGMA subset of 248 widely conserved eukaryotic core genes that are considered to have low frequencies of gene family expansion (http://korflab.ucdavis.edu/Datasets/genome_completeness/). We estimate that 203 of 248 CEGMA genes are complete in our gene list, and an additional ten genes are represented as fragments, for a total representation of 86% of the eukaryotic core gene set. A larger set of 441 core genes is also available from the CEGMA website for Neurospoa crassa, for which we identified 379 probable orthologs (86%) in A. apis. Thus, both eukaryotic and fungal core gene sets are equally represented in our annotations. These results are concordant with the fraction of non-matching transcript contigs, and suggest that less than 15% of the true gene content remains unassembled.

Phylogeny
Geiser and colleagues [16] used ribosomal sequence to estimate the phylogeny of the Eurotiomycetes, the ascomycete class to which Ascosphaera has been assigned. Their results indicated a position within order Onygenales, although the authors noted that support for the placement of Ascosphaera was not strong. To further clarify the evolutionary position of A. apis within the Eurotiomycetes, we used 14 representative species with sequenced genomes to create a phylogeny from the concatenated alignment of 12 conserved proteins (see Methods). The resulting maximum-likelihood tree ( Figure 1) placed A. apis in a position congruent with that estimated by [16], i.e. within the Onygenales and not the sister clade Eurotiales. However, A. apis is the earliest branching of the Onygenales species included in this phylogeny. Furthermore, while each of the 12 loci produced tree topologies for the other 14 species that were concordant with the concatenated data set, the position of A. apis within the Eurotiomycetidae was variable for each individual locus (results not shown). This phylogenetic instability appears to be due to a higher level of sequence divergence along the A. apis lineage, as indicated by the long branch leading to A. apis in Figure 1.

Domain analysis
We used HMMER [17,18] to identify Pfam domains in our gene models with an expectation threshold of E-value< 0.1 and compared the distribution to other sequenced and annotated ascomycetes. Given the smaller gene number in A. apis, it is not surprising that this species had fewer domain types (Table 1). Of greater interest are changes in domain number that suggest gene-family expansion and contraction within a phylogenetic lineage. To investigate this, we identified Pfam domains present in A. apis for which there were more than five additional genes with that domain in all other ascomycete genomes examined ( Table 1). The A. apis lineage appears to have experienced reductions in several common domains, most strikingly the cytochrome p450 and protein kinase domains. Others that appear reduced include domains characteristic of transcription factors (zinc finger and fungal-specific transcription factor) and several categories of the Rossmann fold superfamily of nucleotide-binding domains (e. g., methyltransferase and FAD/NAD-binding domains). In contrast, no Pfam domain was identified for which there were more than five additional copies in A. apis compared with the other six genomes. Thus, there have not been major expansions of gene families with described domains since the divergence of the A. apis lineage from these other species.
We searched for novel gene families within A. apis by using BLASTClust to group all proteins without a significant Pfam domain. One family of homologous sequences was identified that included seven A. apis genes predictions. However, two genes, AAPI10243 and AAPI10242, may be N-and C-terminal fragments of a single protein, and AAPI10243 also shared a long stretch of identical nucleotide sequence with AAPI11927, Figure 1 Phylogeny of A. apis and 14 other Eurotiomycete fungi. Twelve conserved protein-coding loci were used to clarify the phylogenetic position of A. apis. Amino-acid sequences were aligned with MUSCLE and the best fitting evolutionary model (JTT + G + I) was selected with MEGA5. The unrooted maximum likelihood tree is based on this model with five rate categories; bootstrap support from 1,000 replicates is shown. suggesting a possible assembly error. Thus, the family may contain only five distinct members. BLASTP against the NCBI nr database identified homologs of this gene family in other ascomycete genomes, but with an unusual distribution. Single homologs were found in Neurospora crassa, Ajellomyces dermatitidis, Uncinocarpus reesii, and the ascomycete Trichoderma virens, and multiple matches were found in Coccidioides posadasii and the ascomycete Mycosphaerella graminicola, some of which appeared to be fragments as annotated. In contrast, the gene family is greatly expanded in Chaetomium globosum and Coccidioides immitis, with 18 and 19 matching protein predictions, respectively. Most of these homologs are shown in a neighbor-joining phylogeny in Figure 2  (some partially aligning predictions were excluded due to potential annotation errors). This phylogeny is based on an alignment of BLASTP-matching peptides in GenBank, which was then trimmed to include only the most conserved region, illustrated in Figure 3. Distances were based on the JTT matrix and nodes with less than 70% support from 1,000 replicates were collapsed in the consensus phylogeny. The clustering of paralogs within species and the overall concordance of the gene and species phylogenies ( Figures 1 and 2, see also the phylogeny of [16]) suggests independent expansions of the gene family within Ascosphaera, Chaetomium, and Coccidioides. Indels were common in the conserved region shown in Figure 3, suggesting that these genes may be rapidly evolving or not under strong purifying selection. There was no indication that these proteins are secreted or have transmembrane domains, however, based on SignalP [19] and TMHMM [20] domain searches. A TBLASTN search using this region identified highly significant matches in other ascomycete species as well, several of which were apparent pseudogenes based on the presence of nonsense mutations. One TBLASTN match was in the subtelomeric region of N. crassa chromosome IVL, suggesting that members of this gene family may occur in subtelomeric repeats. If so, this may explain the idiosyncratic pattern of expansions observed, as subtelomeric regions frequently harbor gene amplifications and are often correlated with fungal niche adaptation [21]. Interestingly, no cDNA reads mapped to this gene family. However, four of the homologs in C. immitis were among Figure 3 Core conserved region characterizing a novel gene family identified in A. apis that also occurs in other ascomycete fungi. Representative A. apis predicted proteins and GenBank accessions were aligned with MUSCLE and then manually trimmed to the most conserved region. Two-letter codes after each accession signify the species of origin: AD, Ajellomyces dermatitidis; CH, Chaemtomium globosum; CI, Coccidioides immitis; CP, Coccidioides posadasii; MG, Mycosphaerella graminicola, TV, Trichoderma virens; UR, Uncinocarpus reesii. the 51 genes found to be under positive selection in that species by [22].

Pathway annotation and analysis Virulence factors
Filamentous fungi produce a variety of secondary metabolites, many of which are involved in virulence [23,24] . The ability of an entomopathogenic fungus to infect its insect host is thought to depend on the coordinated activity of these virulence molecules together with mechanical pressure of hyphae on the exoskeleton and/or gut peritrophic membrane. In support of this model, we detected several transcripts involved in the production of A. apis hydrolytic enzymes, listed in Additional file 1: Table S1. For example, we identified genes encoding chitinases, proteases and esterases, degrading enzymes that could be involved in both host invasion and escape processes. We have also identified several genes encoding homologs of a cutinase transcription factor, Ctf1, indicating the ability of this fungus to utilize different nutrient substrates, including plant cutin, a variety of lipids as well as synthetic triacylglycerols and esters [25].
Furthermore, our updated genome annotation reveals a number of genes encoding homologs of well-known toxins and a diverse family of enzymes, some of them involved in the Aflotoxin (AF)-Sterigmatocystin (ST) biosynthesis pathway (AflR, StcU, Nor-1, SteW, OmtA, OrdA), HC-toxin biosynthesis (Tox A, ToxG, ToxD, ToxF and Hts1), and a super killer protein 3 (Ski3) (Additional file 1: Table S1). In addition to genes involved in toxin biosynthesis, our gene set includes a large variety of genes involved in transcription, translation and biosynthesis of enzymes that have also been implicated in virulence. The list includes an extracellular glucoamylase, 3 chitinases, 16 amidases, 30 esterases, 42 proteases, 24 lipases and others (Additional file 1: Table S1). A large number of these enzymes were found to be expressed in the infected host, which may implicate their role in host pathogenesis. Among them, a member of the metallopeptidase family M28 containing a transferrin receptorlike dimerisation domain (IPR007365) and a proteaseassociated PA domain (IPR003137) similar to a metallopeptidase found in C. immitis as well as vacuolar endopeptidase Pep2, a broad specificity secreted hydrolase that has been implicated in Aspergillus fumigatus virulence [24]. Another interesting example is that of the phospholipases (plcA, plcB, plcC and plcD) previously identified in Mycobacterium tuberculosis [26]. The A. apis genome encodes multiple genes homologous to those found in M. tuberculosis, Neosartorya ficheri and Phytophtora infestans (Additional file 1: Table S1).
Many virulence factors produced by A. apis have been previously identified using enzymatic activity assays and electrophoresis. The list includes galactosidases, glucosidases, catalases, phoshphatases, DNAses, and RNAses [27,28]. Among those, 12 enzymes and 285 isozymes were found in A. apis and closely related fungi, Ascosphaera aggregata and Ascosphaera proliperda [27,29]. Enzymatic activity of 11 enzymes (i.e. protease, β-N-acetylglucosaminidase, alkaline phosphatase, esterase, esterase lipase, leucine arylamidase, valine arylamidase, acid phosphatase, naphthol-AS-BI-phosphohydrolase, βglucosidase and α-mannosidase) has been characterized by [28]. However, it is very difficult to compare our findings to those previously reported since earlier investigations mostly focused on the enzymatic activity, and therefore lacked gene and/or protein related data [27,28]. Furthermore, some of the previously published data is not supported by our findings. For example, the lack of chitinase activity in A. apis has been determined experimentally by many investigators, that led to a conclusion that A. apis does not digest insect chitin [27,[30][31][32]. In the absence of chitinase activity, the N-acetyl-β-glucosaminidase in conjunction with hyphal pressure has been suspected in aiming penetration of the peritrophic matrices lining the gut epithelium [27]. However, our gene predictions identified at least three different glycosyl hydrolases with the GH18 Pfam domain characteristic of class III and class V chitinases (www.cazy.org) (Additional file 1: Table S1). GH family 18 chitinases are frequently implicated in the mycoparasitic activity by the degradation of exogenous chitin [33] and thus their presence in A. apis revives the possibility that these enzymes are involved in host pathogenesis. We detected four and two sequence reads from this gene in axenic culture and infection, respectively, indicating that it is not exclusively expressed during infection. However, additional work is needed to definitively determine whether any of these chitinases are involved in penetrating the peritrophic matrix of the honey bee larva. It is noteworthy that GH18 chitinases are highly sensitive to allosamidin (β-D-allopyranoside), a potent chitinase inhibitor (http://www.reference.md) and therefore may be a potential target for fungal control.

Signal transduction
Fungi have sophisticated and conserved signalling cascades to sense and respond to a rapidly changing environment by regulating essential functions such as melanin formation, mating, virulence, and morphogenesis. Signal transduction pathways described in fungi include the Mitogen-Activated Protein Kinase (MAPK) pathway, cAMP-dependent protein kinase pathway (PKA), and calcineurin-Ca 2+ -calmodulin-activated phosphatase signalling. Among those, the MAPK cascade is a key pathway involved in stress responses and fungal pathogenicity in a wide range of pathogenic fungi. Here we identified several components of this pathway, including several MAP kinase homologs (Hog1, MapK, Spm1), MAP kinase kinase (Mkk1) and stress activated MAP kinase kinase kinase (Win1) (Additional file 1: Table S1).

RNA interference pathway
Post-transcriptional gene silencing is a part of a broad host defence response against nucleic acid invaders in most eukaryotic organisms, including filamentous fungi [34][35][36]. Although eukaryotic genomes vary in the number of genes involved in the RNAi machinery, generally, filamentous ascomycete fungi encode two homologs each of the core RNAi proteins Dicer (Dcl) and Argonaute (Ago). However, some fungi, such as Aspergillus nidulans, appear to encode only a single copy of the Dcl and Ago proteins.
Our genome analysis showed that A. apis contains most of the core RNAi components, including two Dicer genes (Dcl-1 and Dcl-2) and one Argonaute (Ago1), but not Ago2, similar to A. nidulans (Additional file 1: Table S1). The sequence for this gene is present in neither the genome assembly nor in the non-aligned transcript contig sequences. It is possible that some components of the RNAi pathway might have been lost in A. apis during its divergence; with other family members joining the RNAinduced silencing (RISC) complex to assume the function of the missing family members [35]. Presence of the core RNAi components in the A. apis genome and RdRP homolog can be exploited for development of an RNAibased control strategy of this bee pathogen [36].

Sexual reproduction
Sexual reproduction in ascomycetous fungi is governed by a single mating type (MAT) locus with two idiomorphs. Although flanking regions are frequently conserved, idiomorphic alleles can differ greatly in structure and gene content, but minimally encode a transcription factor with either an α-box domain or a high-mobility-group (HMG) domain [37][38][39][40]. These transcription factors activate alternative pathways for mating-type development.
The A. apis MAT-2 allele, which contains the HMGdomain gene, was previously identified [8]. However, those authors were unable to identify the MAT-1 allele either through sequence analysis of the A. apis genome assembly or degenerate PCR, suggesting that the allele was not adequately covered by genome sequencing or was too divergent to be annotated or amplified. In this sequencing effort we have identified a transcript contig that is a BLASTX match to an A. capsulatus α-box protein (Gen-Bank accession ABO87596), but does not align to the genome assembly. To determine whether this transcript is part of the A. apis MAT-1 allele, we designed primers to amplify this sequence from genomic DNA, under the assumption that it is flanked by the genes Sla2 and Apn2 as in the MAT-2 idiomorph [8]. Figure 4 summarizes our PCR analysis of the MAT locus structure for MAT-1 and MAT-2 strains of A. apis.
A forward primer annealing upstream of the MAT-2 HMG-domain gene (5′MAT2) and a reverse primer annealing to the Apn2 gene (primer DNALR1) [8] produced a 2.8 kb product in both strains (lanes 1 and 2 of Figure 4) as expected. However, no PCR product could be amplified from the MAT-1 strain using the Sla2/ Apn2 primer combination, suggesting that the Sla2 gene is not within the idiomorph-specific region and is divergent in sequence, position, or orientation. We were able   Table S1.
to amplify products specific to the putative Mat1-1 αbox transcript contig and the Mat1-2 HMG-domain gene only from the respective strains, confirming that the α-box transcript derives from the MAT-1 allele of the MAT locus. A diagram of the inferred structure is presented in Figure 5.

Comparative gene expression
The present study was primarily designed to improve the annotation of A. apis. The data can in principle be used to estimate differences in transcript abundance between axenic culture and host infection, thereby identifying candidate host-responsive genes for further study. However, without additional replication, it is impossible to calculate a variance for individual transcripts and thus distinguish between environmental and stochastic effects. An implementation of Fisher's Exact Test can be used to estimate the significance of differential read count abundance [41], but for unreplicated data sets we expect the resulting P-values to be strongly influenced by the high stochasticity typical of genome-scale expression patterns. This leads to a somewhat arbitrary, post hoc choice of P-value threshold for identifying differential expression. Given these considerations, we adopted an ad hoc strategy to identify genes that are strong candidates for having host-responsive expression, but nonetheless requiring further validation. In addition to calculating the P-value of Fisher's Exact Test (with multiple-test correction) with the edgeR package [41], we calculate a variance in log 2 (ΔRPKM) as a function of transcript length using a slidingwindow approach (see Materials and Methods). Genes that were both two standard deviations from the local mean and significant by Fisher's Exact Test were considered to be candidate differentially expressed genes. We limited our sample to genes with at least 50 mapped reads in at least one growth condition (1,928 gene models and unmapped transcript contigs, or 25.5% of the total). Figure 6 shows two views of the distribution of differential abundance estimates: the log 2 difference in RPKM plotted as a function of transcript length (panel A) and with respect to P-value of Fisher's Exact Test (panel B). Genes two standard deviations or more from the local mean are shaded in Figure 6A, which include 15 genes up-regulated with host infection and 28 that were downregulated. All of these genes (Additional file 2: Table S2) had highly significant P-values by Fisher's exact test and their annotation reveals a number of genes involved in the transport and metabolism of nutrients, particularly ions and sugars, which is consistent with a regulated response to changes in nutrient availability. In fact, the up-regulated gene with the highest P-value by Fisher's exact test, AAPI14316, is homologous to the yeast glycerol/H + symporter STL1 [42], and the most significantly down-regulated gene, AAPI11676, is homologous to D-arabinitol dehydrogenase [43], an enzyme closely linked to glycerol metabolism. This concordant change Figure 6 Differential gene expression. Differential counts of reads mapping to A. apis gene models between culture types, with positive values indicating higher abundance in host infection relative to axenic culture. A) Differential counts of reads, normalized by both total reads mapped and transcript length (RPKM), between host infection and axenic culture is plotted on a log 2 scale on the vertical axis. Values are plotted as a function of gene length in nucleotides because the variance in RPKM is greater for shorter genes. Black points are greater than two standard deviations from the mean RPKM for a sliding window of 500 genes, ranked by length. B) Differential counts of reads normalized for total mapped reads in each library but not for length of predicted transcript. Log 2 differential is on the horizontal axis and significance by Fisher's Exact Test with multiple test correction is on the vertical axis. Colors indicate progressive ranges of probability of the null hypothesis (equal abundance in both samples), plotted as the -log10 (P-value) for clarity. in gene expression strongly suggests a shift in glycerol metabolism related to changing carbohydrate availability. Two additional genes with increased expression may be indicative of an increased metabolic rate during host infection: superoxide dismutase (AAPI10195) and SBDS (AAPI14490), which encodes a protein involved in ribosomal biogenesis and rRNA metabolism [44].
Three genes were up-regulated that are likely involved in the regulation of the plasma membrane and fungal cell wall, and thus could contribute to virulence. These genes were AAPI15600, a homolog of SUR7, a regulator of plasma membrane organization; AAPI12340, encoding a GTPase with a closest homolog in S. cerevisiae that is implicated in 1,3-alpha-glucan biosynthesis; and AAPI13585, encoding a non-secreted phospholipase A2. Another up-regulated gene (AAPI20003) is an 811-bp spliced transcript that is well supported by cDNA data but which is only predicted to encode a 62 amino acid peptide with no detected homology to GenBank proteins. It is possible that this is in fact a noncoding, regulatory transcript. All of these genes are useful candidates for further study by molecular methods such as in situ hybridization, quantitative PCR, or RNAi.

Conclusions
From more than 1.5 million species that comprise the fungal kingdom, genomes of only about 82 species have been completed to date, but many more are currently in the pipeline. Among those sequenced, about 66 belong to Ascomycetes (http://www.yeastgenome.org/cgi-bin/ blast-fungal.pl; http://www.broadinstitute.org). None of them are close relatives to the honey bee pathogen described in this study [45]. Indeed, while A. apis has been studied since the early 1950's [1], genetic data for this pathogen were limited until very recently. Following the sequencing of the A. apis genome in 2006 [7], our comprehensive, transcriptome-informed gene annotations constitute a major step forward in delineating the molecular processes underlying its life history, including complex interactions with its host.
In this study we have provided a revised annotation of A. apis that incorporates extensive transcriptomic data, resulting in~7,000 predicted protein-coding genes. We have annotated components of molecular pathways likely to contribute to virulence and reproduction. Following our previous description of the MAT-2 idiomorph [8], here we describe the second mating type allele (MAT-1) encoding α-box transcription factor, MAT1-1. We have also found other transcripts involved in the pheromone biosynthesis pathway, including pheromone receptors, binding proteins and efflux pumps, but failed to detect any sequences of the predicted fungal pheromones similar to those identified in genomes of heterothallic filamentous Ascomycetes. This could be due to the fact that these sequences are very short and highly diverged between Ascomycota. In support of this possibility, an extremely low degree of nucleotide conservation in these genes suggests a very rapid rate of evolution [38]. Alternatively, it is also possible that entirely different types of pheromones assumed a communication role in A. apis.
The differentially expressed genes identified here include candidate virulence factors that can be further investigated with molecular methods. Host pathogenesis is a series of events progressing from invasion and inhibition of host defenses, to massive proliferation of the fungus within the host. Every step of this process is well coordinated by activation of a wide range of virulence factors. Based on our findings, we propose an alternative strategy employed by A. apis during host invasion that differs from that suggested previously [27]. Our transcriptome analysis showed activation of a wide range of the hydrolytic enzymes that may help fungus to overcome host protective barriers. Specifically, transcripts encoding fungal chitinases may assist in penetration of the larval gut peritrophic membrane (PM) during host invasion and the cuticle at the final stage of exiting from a cadaver. Furthermore, we found components of the Aflotoxin-Sterigmatocystin (ST) biosynthesis pathway. Among those, AflR (AAPI16014) is a regulatory gene that controls transcription of the AF/ST pathway genes and StcU/Ver1 (AAPI15087) was implicated in a critical step of toxin biosynthesis in Aspergillus spp., the conversion of Versicolorin A to ST [46][47][48].
Collectively, our results provide new hypotheses and significant resources for investigating A. apis pathogenicity. Data generated in this study has been submitted to the public databases, the Bee Pests and Pathogens website and NCBI. It will become a platform shared by many scientists to decipher critical biochemical events occurring during host pathogenesis and expected to contribute ultimately to the control of the pathogen, improve honey bee health, and increase security for the worldwide food supply.

Production of fungal tissue
Mixed cultures of A. apis strains ARSEF 7405 and 7406 [7] were grown at 35°C in liquid or solid YGPS growth media [2] containing ampicillin (100 μg/ml) and streptomycin (12 μg/ml). Plate cultures were grown under 6% CO 2 for 4 d, and liquid cultures were grown under normal atmosphere with shaking for 6 d. Plate cultures were visually inspected for contamination and liquid cultures were plated to check for contamination. Only uncontaminated cultures were further processed. Mycelia and spores were harvested and washed three times with icecold 0.9% NaCl, blotted dry, and stored at −80°C until used for total RNA isolation.

Infection of honey bee larvae
Mixed cultures of ARSEF 7405 and 7406 [7] were grown at 35°C 6% CO 2 on YGPS plates for two weeks, and spores were recovered from plate surfaces by rinsing with 0.001% Triton X100. The spore suspension was washed three times and resuspended in ice-cold dH 2 O to a final concentration that yielded a spore stock with an OD 600 of 16. Plating confirmed that the suspension was free of contaminants. The spore stock was diluted 20-fold into 33°C mixed diet [49]. The infection of honey bee larvae was performed basically as described by [50]. Briefly, three-day-old larvae in groups of ten were fed with 500 μl of the spore suspension in mixed diet (~10 5 spores per larva), and kept at 33°C under 97% RH in 6-well culture dishes. Subsequent feedings were done as needed using mixed diet without spores. Upon death of a larva, it was removed to a sterile Petri dish and incubation was continued under the same temperature and humidity conditions for 18-24 h. Larval remains were macerated and combined with an equal volume of ice cold 3% sodium citrate (pH 8.4) containing 5 μg/ml actinomycin D, and stored at −80°C. A small amount of the material from each sample was viewed under a microscope to confirm the presence of A. apis ascomata, and any sample lacking such spore cysts was discarded.

RNA isolation
Frozen aliquots were rapidly ground in Trizol reagent (Invitrogen) using Teflon pestles in microcentrifuge tubes. RNA was subsequently isolated following the manufacturer's instructions. DNA was removed from the preparations with DNAseI (Applied Biosystems, Austin TX) followed by the removal of rRNA with Ambion's Poly (A) Purist kit (Applied Biosystems, Austin TX). The absence of A. apis and honey-bee DNA following DNA-seI treatment was confirmed by PCR using actin speciesspecific primers ( Table 2).

RNA quality and integrity control
Total RNA concentration was determined with a Nano-Drop ND-1000 spectrophotometer (ThermoFisher Scientific) and brought to a 600 ng/μl concentration. RNA integrity was determined using an RNA 6000 Nano Chip and run on an Agilent 2100 Bioanalyzer (Agilent Technology Inc.). Next, an RNA Pico Chip was used to determine how well rRNA was removed and to visualize the size of the mRNA transcripts. Samples were diluted to 5 ng/μl using RNAse-free H 2 O, denatured for 2 min at 70°C then immediately placed on ice. One microliter of each sample was run on the RNA Pico Chip and the remainder stored at −80°C.

cDNA library construction and whole transcriptome sequencing
Double-stranded cDNA was amplified with the cDNA Synthesis System (Roche Applied Science) using random hexamer priming, according to a manufacturer's protocol. For each sample, 200 ng of cDNA was used to make a sequencing library using the GS FLX Titanium Rapid Library Preparation Kit Roche 454 Life Sciences). Each library was bar-coded with a unique 10-bp sequence. Each library was diluted to 1.0E7 molecules/μl in a 500 μl total volume and stored at −20°C until sequencing. Samples were sequenced on a Roche 454 Titanium 200 Sequencer at the Purdue University Sequencing Center (Purdue University, Lafayette, IN). The output sequence was assembled by the Sequencing Center using GS Assembler (Roche 454 Life Sciences). Reads from axenic culture and host infection were assembled separately for transcript prediction with Maker (see below) as well as jointly for estimating the completeness of the genome.

Gene prediction
The Maker annotation pipeline [9] was used to annotate the A. apis genome [7]. Maker masked repetitive sequences with RepeatMasker (www.repeatmasker.org) using a repeat library generated with RepeatModeler (version 1.0.3, http://www.repeatmasker.org/RepeatModeler. html). Maker used WU-BLAST [51] and Exonerate [52] to align EST and protein evidence to the genome. The assembled 454 isotig sequences from both the axenic culture and host infection samples were used as EST evidence, of which MAKER aligned 12,296 out of 16,191 isotigs (75.9%) to the genome scaffold assembly. Protein evidence included protein sequences from A. apis and four other fungal species (A. capsulatus, A. fumigatus, C. immitis and Emericella nidulans) downloaded from GenBank [53] in addition to all fungal proteins downloaded from UniProtKB/SwissProt [54]. Two ab initio gene predictors were trained for use with Maker: SNAP [55] and Augustus [56]. SNAP was trained in a bootstrap process as described below. Augustus was trained using the training pipeline packaged with the software (autoAug.pl) and the assembled 454 isotigs from both samples as evidence. Maker was run a total of four times. All of the EST and protein evidence was aligned to the scaffold sequences in the first run. These alignments were re-formatted as an initial training set for SNAP. Maker was then run a second and third time with SNAP as the gene predictor, re-training the SNAP prediction algorithm between each run using the gene annotations in order to improve the algorithm. Both SNAP and Augustus were used as gene predictors in the fourth run of MAKER to produce a final set of gene annotations, which included 6,718 genes encoding 6,771 transcripts.
The gene annotations produced by Maker were scanned with InterProScan [57] to identify protein domain signatures in 5,149 of the transcripts (76%) and assign GO identifiers. A set of 1,352 ab initio predictions that do not overlap these evidence-based annotations were also scanned with InterProScan. The 299 predictions (22.1%) that contained protein domain signatures were promoted to the gene set. Genes with repeat or viral associated protein domains were then removed from the gene set, resulting in a final official computational gene set containing 6,992 genes encoding 7,045 transcripts. Of these, 5,423 transcripts have at least one protein domain recognized by InterProScan. Putative functions were assigned to the annotations by using BLASTP [58] to identify homologs in the UniProt/Swis-sProt database (downloaded 07-22-2010).
All gene annotations and supporting evidence alignments produced using Maker along with protein domain signatures from InterProScan were loaded into a GBrowse genome browser available on the Bee Pests and Pathogens website (http://hymenopteragenome.org/ beebase/?q=bee_pathogens). BLAST databases containing the scaffold assembly, the official and ab initio gene sets, NCBI GenBank predictions and manual annotations are also available.

Manual annotation methods
Transcripts were compared to sequences in GenBank. TBLASTX and BLASTX searches were done against translated nucleotide databases and the non-redundant protein databases respectively. The deduced amino acid sequences were analyzed using the NCBI Conserved Domain Database (CCD) search service (version v2.14) and NCBI Conserved Domain Architecture Retrieval (CDART) tool. Manually annotated transcripts and the Bee Pests and Pathogens database accession (AAPI) numbers are provided in the Additional file 1: Table S1.

Relative gene expression
Individual sequence reads were mapped with Mega BLAST [63] at a 97% identity threshold (read mapping with SSAHA2 [64] produced highly concordant results). Because the current genome assembly [7] includes regions of high similarity, the Maker-predicted transcripts were clustered with BLAST (97% identity and 70% overlap) prior to mapping to reduce redundancy in the reference. Gene expression estimates were considered both as raw counts and normalized to transcript length (RPKM). We considered genes to be strong candidates for having differential expression between axenic culture and host infection if 1) the log 2 difference in RPKM was greater than two standard deviations from the mean, and 2) if the read counts were significantly different by Fisher's Exact Test with Benjamini-Hochberg correction for multiple tests, calculated with the edgeR package [41]. Because the variance in read counts is in part a function of transcript length [65], we calculated the standard deviation in differential expression as a sliding window of the 500 closest genes in terms of transcript length. The variance in expression is also a function of expression level, but we did not attempt to correct for this because expression is expected to be highly context dependent for biological reasons whereas transcript length is not (disregarding alternative splicing). This ad hoc method is intentionally conservative because no replication of the sequence library is available, and thus a variance estimate for individual transcripts is not