Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated.

. Algorithm for modelling UTRs. Figure S3. Algorithm for improving exon-intron structures based on RNA-seq data. Figure S4. Algorithm for counting reads that map to predicted features (e.g. mRNAs) for implementation in Perl. Figure S5. Analysis of genome-wide coverage of different genomic regions (exons of protein-coding genes, introns of protein-coding genes, intergenic regions). Noncoding RNAs and repeats are not included in this analysis. Percent of reads that map to the corresponding regions are shown (only reads are counted where both ends map to the same type of region). At the right end of the graph (separated by a dashed line), the relative distribution of these regions across the genome is indicated. Figure S6. Venn diagram of numbers of genes without any read counts. Read counts were summed up from the two replicates of each condition (veg, sex, wt proto, pro1 proto). In total, there are 764 genes that have no counts in at least one condition. Figure S7. Transcript analysis of selected genes by quantitative real time PCR and RNA-seq. Hierarchical clustering of log 2 of fold ratios for wild type protoperithecia/sexual mycelium (A) and vegetative/sexual mycelium (B). Letters C and L indicate classic (C) and LOX (L) analysis of gene expression from RNA-seq data. Quantitative real time PCR experiments were done at least twice for each gene with independent biological replicates. Quantitative real time PCR experiments in (A) were performed in this study, data in (B) were from this study for SMAC_01397, SMAC_06025, SMAC_07008, SMAC_09142, ppg1, ppg2, and pro1, and from previous experiments for the other genes [9,41,53,73]. Note that growth conditions for total vegetative and sexual mycelia for real time PCR experiments were in defined medium, whereas for RNA-seq analysis, RNA from mycelia grown in defined medium and cornmeal medium were pooled. However, the genes investigated here are mostly involved in development or strongly differentially regulated depending on the developmental stage, and therefore not expected to be greatly dependent on the growth medium. Hierarchical clustering and heatmap generation were done in R. Gene names are given for genes that were previously shown to be involved in or differentially regulated during development. Figure S8. Expression of known developmental genes. Hierarchical clustering of log 2 of fold ratios as determined by classic (C) and LOX (L) analysis. Log 2 ratios < -10 or > 10 were set to -10 and 10, respectively, to make for better scaling visibility. Hierarchical clustering and heatmap generation were done in R. Figure S9. Phylogenetic tree of all DUF3328 proteins from Sordaria macrospora (red, genome version 02), Neurospora crassa (light blue, data from http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html) and Neurospora tetrasperma (dark blue, data from http://genome.jgipsf.org/Neute_matA2/Neute_matA2.home.html). The tree was calculated with neighbor joining with 10.000 bootstrap replications, bootstrap support in % is given at the branches. The three S. macrospora genes that are physically clustered within the genome are shown in grey boxes. Interestingly, the three clustered DUF3328containing genes are not part of a closely related paralogous group, but are present on distant braches of the phylogenetic tree. Furthermore, the DUF3328-family seems to be expanded in S. macrospora with 10 genes in contrast to six and seven genes in N. crassa and N. tetrasperma, respectively. Figure S10. Distribution of gene expression levels. Histograms of log 2 of coverage (normalized to coverage per kilobase per million counted bases) for each locus tag for each sample (left, midle), and estimated frequency distribution functions for the mean for each condition (right). In case of zero coverage, log 2 coverage was set to -13 (otherwise all log 2 values > -13) and was not used here. The distribution function (red) for each condition could be dissected into components. The components (blue, green and yellow lines) are normal distributions with varying means and variances that make up different proportions of the observed distribution. Estimation of mixtures was done with the mclust package from R [66] and manual curve adjustments. Tables   Table S1. Transcription factors among the genes with top 500 read counts and their homologs in Neurospora crassa and Fusarium graminearum. Locus tag numbers are given for the S. macrospora (S.m.) genes and their homologs in N. crassa (N.c.), and F. graminearum (F.g.) with e-values for the best BlastP hit. Described phenotypes of knockout strains for the corresponding genes are from the Neurospora knockout project [55] (http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html) for N. crassa and from the Fusarium graminearum Transcription Factor Phenotype Database [56] (http://ftfd.snu.ac.kr/FgTRPD) for F. graminearum, and from [54] and [9] for S. macrospora mcm1 and pro44, respectively; n.d., no knockout phenotype described.

Supplemental
A. transcription factors among top500 genes occuring in wt proto and pro1 proto (therefore most likely not or not completely dependent on pro1)  Table S2. Comparison of results from RNA-seq and microarray analysis. Differential gene expression between sexual and vegetative mycelium was compared for genes with expression ratios in the RNA-seq analysis and published microarray data (9). Note that mycelia for RNA extraction for microarray analysis were grown in synthetic medium, while for RNA-seq analysis, RNA from mycelia grown in synthetic medium and complete medium was combined (for both sexual and vegetative mycelium) to maximize the number of expressed genes. This might lead to a lower number of differentially expressed genes, because signals for genes that are only differentially expressed when grown in a certain medium might be quenched in this type of analysis. However, this might indicate that genes that are differentially regulated in both array and RNA-seq experiments are more likely to be affected by sexual development than by the growth medium. Genes downregulated only in the RNA-seq exp.: 51 370 * 1 genes with significantly differential expression according to the thresholds that were used in the respective analyses. Thresholds for significantly differential expression were higher for the RNA-seq data than those used in previous microarray analyses. *
UTRs were predicted according to the principle shown in Figure S1 with custommade Perl scripts. The algorithms used to predict 5' and 3' UTRs were implemented as subroutines (sub) and can be described as follows: a) sub search_for_slope sub to search for sloping read counts (indicative of end of transcript) in given RNAseq data. It takes an array with count data (must be sorted previously in correct order) and returns position of putative UTR (minimum) with respect to length of input array. Conditions to search for in windows of 20 nt (overlapping sliding window analysis): slope of < -1.3, and counts at beginning of slope more than 5x the counts at end of slope, and slope of next window again larger and previous window same or larger (i.e. find the steepest slope), and the number of reads at the beginning of one window has at least once reached the number of average counts for this CDS. Also, search is stopped if reads at the beginning of one window have once reached the number of average counts and afterwards drop to lower than 0.2x the average counts (without finding a UTR) or dropped to <=1 counts (even without reaching average counts previously). These conditions are specifically set for reads with a 3' bias where the peak of reads can be after the end of the CDS.
b) sub search_for_slope_full_coverage sub to search for sloping read counts (indicative of end of transcript) in given RNAseq data. It takes an array with count data (must be sorted previously in correct order) and returns position of putative UTR (minimum) with respect to length of input array. Conditions to search for in windows of 20 nt (overlapping sliding window analysis): slope of < -1.3, and counts at beginning of slope more than 5x the counts at end of slope, and slope of next window again larger and previous window same or larger (i.e. find the steepest slope). The difference to the subroutine search_for_slope is that here search is stopped if the number of counts at the beginning of a window is outside 0.2-5 x the average number of counts for the CDS. This is useful if the coverage of each transcript is even (no 3' or 5' bias in the reads), because then, an upward slope might indicate the beginning/end of another transcript. In data with end bias, this subroutine is not useful, because here the reads might cluster in a bellshaped curve after the end of the CDS.
c) sub search_for_slope_gradient_to_zero sub to search for sloping read counts (indicative of end of transcript) in given RNAseq data. It takes an array with count data (must be sorted previously in correct order) and returns position of putative UTR (minimum) with respect to length of input array. Conditions to search for in windows of 100 nt (overlapping sliding window analysis): slope of < -1, and counts at beginning of slope more than 3x the counts at end of slope, and the reads at the end of the slope < 10 (i.e. no more coverage beyond the transcript save background), and slope of next window again larger and previous window same or larger (i.e. find the steepest slope), and the number of reads at the beginning of one window has at least once reached the number of average counts for this CDS. Also, search is stopped if reads at the beginning of one window have once reached the number of average counts and afterwards drop to lower than 0.1x the average counts (without finding a UTR) or dropped to <=10 counts (even without reaching average counts previously). This condition is specifically set for reads with a 3' bias where the peak of reads can be after the end of the CDS. This sub is for finding gentler, longer slopes when the search for the more steep slopes has failed. Method S2. Annotation of novel gene models based on RNA-seq data.
For the genome version 02, we annotated 125 novel genes based on RNA-seq patterns of spliced reads that are evidence of processed transcripts, and improved the exon-intron structure of 869 predicted genes ( Figure S2). Altogether, ~1,000 genes were improved or newly annotated which corresponds to ~10 % of all genes in the S. macrospora genome (publicly available in genome version 2, CABT02000001-CABT02001583, http://c4-1-8.serverhosting.rub.de/public/). There could be additional not yet annotated genes that are not spliced, because for the annotation in genome version 2, new genes were derived from splice sites only (not from regions that showed some read coverage, but no evidence for spliced reads). This approach was chosen, because it seems to be a "safer" way to ensure that the annotated genes are "real" genes and not just spurious transcripts. Also, a threshold of at least 50 spliced reads to support a splice site was required, and this could probably be lowered; however, for the present annotation, a rather conservative approach was chosen to avoid calling false-positive "genes".