The influence of trematode parasite burden on gene expression in a mammalian host

Background Parasites can profoundly impact their hosts and are responsible for a plethora of debilitating diseases. To identify global changes in host gene expression related to parasite infection, we sequenced, assembled, and annotated the liver transcriptomes of Balb/cj mice infected with the trematode parasite Schistosoma mansoni and compared the results to uninfected mice. We used two different methodologies (i.e. de novo and reference guided) to evaluate the influence of parasite sequences on host transcriptome assembly. Results Our results demonstrate that the choice of assembly methodology significantly impacted the proportion of parasitic reads detected from the host library, yet the presence of non-target (xenobiotic) sequences did not create significant structural errors in the assembly. After removing parasite sequences from the mouse transcriptomes, we analyzed host gene expression under different parasite infection levels and observed significant differences in the associated immunologic and metabolic responses based on infection level. In particular, genes associated with T–helper type 1 (Th–1) and T–helper type 2 (Th–2) were up-regulated in infected mice whereas genes related to amino acid and carbohydrate metabolism were down-regulated in infected mice. These changes in gene expression scale with infection status and likely impact the evolutionary fitness of hosts. Conclusions Overall, our data indicate that a) infected mice reduce the expression of key metabolic genes in direct proportion to their infection level; b) infected mice similarly increase the expression of key immune genes in response to infection; c) patterns of gene expression correspond to the pathological symptoms of schistosomiasis; and d) identifying and filtering out non-target sequences (xenobiotics) improves differential expression prediction. Our findings identify parasite targets for RNAi or other therapies and provide a better understanding of the pathology and host immune repertoire involved in response to S. mansoni infections. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2950-5) contains supplementary material, which is available to authorized users.

Transcriptome assembly 2 Table S2 Primer sequences developed for the analysis 3 Table S3 RNA quality 5 Table S4 Descriptive statistics of raw reads 6 Table S5 Descriptive statistics of transcriptome assemblies 7 Table S6 Structural errors in transcriptomes 10 Table S7 Average Ortholog hit ratio (OHR) of transcriptomes 11 Table S8 Assembly annotation statistics 12 Table S9 Comparing assemblies against mouse and S. mansoni transcriptomes 13 Table S10 Comparison of predicted ORFs against mouse and S. mansoni transcriptomes 14 Table S11 Annotation statistics of S. mansoni-guided Trinity assemblies 15 Table S12 BLASTN matches of S. mansoni-guided Trinity assemblies 15 Table S13 GO terms and KEGG IDs before and after filtering 16 Table S14 Representation of GO term categories in significantly expressed genes 16 Table S15 DE (upregulated in L compared to U) genes in GO term category immune system process. 17 Table S16 DE (upregulated in H compared to U) genes in GO term category immune system process. 21 Table S17 DE (upregulated H compared to L) genes in GO term category antigen binding 27 Table S18 DE (down-regulated in L compared to U) genes in GO term cytoplasmic part 27 Table S19 DE (down-regulated in H compared to U) genes in GO term category organic and metabolic processes 28 Table S20 KEGG pathway IDs over-expressed in high-infected livers compared to un-infected livers (U vs H). 30 Table S21 KEGG pathway IDs under-expressed in high-infected livers compared to un-infected livers (U vs H) 32 Table S22 KEGG pathway IDs over-expressed in low-infected livers compared to un-infected livers (U vs L). 34 Table S23 KEGG pathway IDs under-expressed in low-infected livers compared to un-infected livers (U vs L). Table S24 KEGG pathway IDs over-expressed in high-infected livers compared to low-infected livers (L vs H). 38 Table S25 KEGG pathway IDs under-expressed in high-infected livers compared to low-infected livers (L vs H) 39 Table S26 Fold change (Log2 fold change) of immune response related genes 40 Figure S1 The number of DE genes predicted 41 Figure  Results: Assembly evaluation The Trinity de novo, Trinity reference guided and SOAPdenovo-Trans assemblies were more complete than Cufflinks in terms of capturing CEGs (Core Eukaryotic Genes; Additional file 1: Table S5). However, all four approaches produced assemblies that contained >89% of CEG genes (Table S5). Other than SOAPdenovo-Trans H1H2 assembly containing many inversions compared to the other assemblies, there was no significant differences in structural variants between de novo and reference guided assemblies (Table S6).
The orthologue hit ratio (OHR) produces an estimate of transcript abundance represented by each unigene. We carried out BLASTX searches against both Mus muculus and S. mansoni protein databases to calculate the OHR. Most unigenes had high OHRs (≥0.8), indicating that many of the transcripts in our assemblies represent complete full length copies (Table S7). An OHR of >1, indicative of insertions in unigenes, was common in the Trinity genome-guided assemblies. The presence of incomplete transcripts reduces the OHR, and our H1H2 assemblies contain more unigenes with a low OHR, despite having a relatively high number of protein hits. We observed a similar trend when the OHR was calculated using S. mansoni proteins (Table S7).
We used the Swissprot database for transcript annotation as it is well curated with little redundancy and is highly integrated with other databases. Not surprisingly, the taxonomic distribution of BLASTX hits indicates more non-chordate hits for the H1H2 assembly than U1U2 assembly (Table S8). This trend was most notable in de novo assemblies, which produced twice as many exclusively non-chordate hits and more sequences matching only to S. mansoni (Table S9). A similar pattern was observed in ORFs, where de novo assemblies contained many reads that were exclusively from the parasite and the number of parasite reads increased with increasing parasite load (Table S10). These findings suggest that de novo approaches may more effectively identify non-target transcripts, but the de novo assemblies also included more contigs that matched both mouse and S. mansoni transcripts (i.e., potential synthetic chimeras; Table S8; Figure 1(b)).

Cufflink
Mus musculus reference genome, bowtie-Tophat-cufflink pipeline Supplementary Table S2. Primer sequences developed for the analysis. PCR profile used was 94 ̊ C 5 minutes, 40 cycles of 94 ̊ C for 45s, 30s at annealing temperature, 30s at 68 ̊ C followed by 5 minutes at 72 ̊ C Results analyzed using Nanodrop 8000, 1% Agarose gel. *: non-target pcr primers ᶲ: qPCR primers (differential gene expression analysis). Swissprot gene name and accession numbers are given in target and description columns.  Table S5: Descriptive statistics of transcriptome assemblies. All four assemblers produced the greatest number of contigs when all samples (L1, L2, H1, H2, U1, U2) were combined. SOAPdenovo-Trans and Cufflinks produced the largest contigs, but SOAPdenovo-Trans contigs contained many gaps in the scaffolds compared to Cufflinks. Reference guided assemblies yielded higher N50 and N50 ratios, suggesting the presence of longer/complete transcripts compared to de novo assemblies. In both de novo and reference guided assemblies, a slight decrease in N50 is apparent in highly infected liver transcriptomes, possibly due to the presence of more short contigs. All four approaches produced assemblies that contained >89% of CEG genes De novo assemblies: Reference guided assemblies: * N50 ratio represents: N50 of assembly/ N50 of Ensemble Mus musculus mRNA library All statistics are based on sequences of size ≥ 500 bp, unless otherwise noted

Supplementary Figures
Supplementary Figure S1. The number of DE genes predicted by DESeq2, edgeR and EBSeq. These include differentially predicted non-target hits. Reference guided Trinity assemblies produced the most DE genes after Benjamini-Hochberg correction. As expected, programs DESeq2 and edgeRE were more conservative in identifying DE genes and EBSeq predicted the most.