Gene co-expression network analysis identifies porcine genes associated with variation in Salmonella shedding
- Arun Kommadath†1,
- Hua Bao†1,
- Adriano S Arantes1,
- Graham S Plastow1,
- Christopher K Tuggle2,
- Shawn MD Bearson3,
- Le Luo Guan1Email author and
- Paul Stothard1Email author
© Kommadath et al.; licensee BioMed Central Ltd. 2014
Received: 13 February 2014
Accepted: 27 May 2014
Published: 9 June 2014
Salmonella enterica serovar Typhimurium is a gram-negative bacterium that can colonise the gut of humans and several species of food producing farm animals to cause enteric or septicaemic salmonellosis. While many studies have looked into the host genetic response to Salmonella infection, relatively few have used correlation of shedding traits with gene expression patterns to identify genes whose variable expression among different individuals may be associated with differences in Salmonella clearance and resistance. Here, we aimed to identify porcine genes and gene co-expression networks that differentiate distinct responses to Salmonella challenge with respect to faecal Salmonella shedding.
Peripheral blood transcriptome profiles from 16 pigs belonging to extremes of the trait of faecal Salmonella shedding counts recorded up to 20 days post-inoculation (low shedders (LS), n = 8; persistent shedders (PS), n = 8) were generated using RNA-sequencing from samples collected just before (day 0) and two days after (day 2) Salmonella inoculation. Weighted gene co-expression network analysis (WGCNA) of day 0 samples identified four modules of co-expressed genes significantly correlated with Salmonella shedding counts upon future challenge. Two of those modules consisted largely of innate immunity related genes, many of which were significantly up-regulated at day 2 post-inoculation. The connectivity at both days and the mean gene-wise expression levels at day 0 of the genes within these modules were higher in networks constructed using LS samples alone than those using PS alone. Genes within these modules include those previously reported to be involved in Salmonella resistance such as SLC11A1 (formerly NRAMP1), TLR4, CD14 and CCR1 and those for which an association with Salmonella is novel, for example, SIGLEC5, IGSF6 and TNFSF13B.
Our analysis integrates gene co-expression network analysis, gene-trait correlations and differential expression to provide new candidate regulators of Salmonella shedding in pigs. The comparatively higher expression (also confirmed in an independent dataset) and the significantly higher connectivity of genes within the Salmonella shedding associated modules in LS compared to PS even before Salmonella challenge may be factors that contribute to the decreased faecal Salmonella shedding observed in LS following challenge.
Salmonella enterica serovar Typhimurium is a gram-negative zoonotic bacterium that can colonise the gut of humans and many species of food producing farm animals and cause enteric or septicaemic salmonellosis . In pigs, infections by S. Typhimurium mostly lead to a localised enterocolitis, which is responsible for significant economic losses to the pig industry . An unknown percentage of Salmonella infected pigs continue to be asymptomatic carriers even after acute response, thereby posing long-term zoonotic threats through contaminating the pork production chain. Prevention and control of salmonellosis in pigs thus assumes great importance not only for animal welfare, reduced antibiotic use and improved profitability of pig industry but also for minimizing risks to public health .
Host genetic response to Salmonella infection has been well studied in several species. A review by Roy and Malo  has reported several genes to be involved in regulation of responses to Salmonella infection in mice, for example, SLC11A1 (formerly NRAMP1), TLR4, BTK, LBP, CD14, CYBB, NOS2, TNF, IL12, IFNG, IL12B, TLR5 and others. Clinical manifestations associated with Salmonella infection are dependent on several factors such as the serotype of Salmonella involved, host species affected and age of the host. While infection with S. Typhimurium induces a systemic disease similar to human typhoid fever in mice, the infection is mostly of the enteric form in pigs, except in the case of very young piglets . Thus a different set of genes may contribute to resistance against Salmonella infection depending on the host species and the Salmonella serotype involved. Indeed, studies of Salmonella resistance in chicken report different genes depending on whether the infection is systemic or enteric . For example, the SLC11A1 gene and the SAL1 locus confer resistance to systemic Salmonellosis [6, 7], whereas several members of the gallinacin gene family confer resistance to enteric Salmonellosis .
Studies on several species have shown that host genetic variations in natural populations contribute to varying responses to different pathogens in terms of resistance or increased susceptibility [9–11]. Distinct responses to Salmonella infection have been observed in pigs, some recovering faster and shedding lower levels of Salmonella in faeces than others (low shedders, LS versus persistent shedders, PS) . This trait variation could indicate the existence of a genetic component to Salmonella shedding and resistance that may be exploited in animal breeding and disease diagnostics. Uthe et al.  reported SNPs in ten genes, including GNG3, NCF2 and CCR1, that were associated with Salmonella shedding in pigs. While many studies have looked into the host genetic response to Salmonella infection, relatively few have used trait-gene expression correlation to identify genes whose variable expression among uninfected individuals may be associated with differences in Salmonella clearance and resistance. For example, several genes involved in innate immunity (IL8, IL18, IFNG, TLR4, iNOS, GAL1, GAL2) have been shown to be either constitutively or inductively more highly expressed in caecal tonsils of Salmonella resistant strains of chicken compared to susceptible strains . A study on Mycobacterium tuberculosis infection in humans identified a SNP that appears to control susceptibility to tuberculosis through its effects on the expression of the DUSP14 gene in dendritic cells prior to infection . Differences in gene expression prior to Porcine Reproductive and Respiratory Syndrome (PRRS) virus infection have been observed among pigs belonging to different phenotypic groups as defined by the viral load and weight gain of individual pigs observed during a defined number of days following infection . Such differences in gene expression among different phenotypic groups have been attributed to the differences in the genetic background of individual pigs . Taken together, the findings reported above may indicate that genetic resistance to infections is mediated in part through the presence of a more activated defense system in resistant individuals compared to susceptible individuals so that, in the event of an infection, resistant individuals can mount a faster and more effective immune response.
In this study we assess the whole blood transcriptome prior to and following infection. Whole blood consists of cells that form integral parts of the immune system and whole blood can be easily and repeatedly sampled. The whole blood transcriptome provides a ‘snap shot’ of the complex immune networks that operate throughout the body  and several studies in humans and animals have analysed the blood transcriptome to provide new insights into host immune responses against a wide variety of pathogens and to identify potential biomarkers [16, 19, 20].
A recent microarray based study  identified several hundreds of differentially expressed (DE) genes including SLC11A1 and TLR4 when comparing the whole blood transcriptome of pigs before (day 0) and two days after Salmonella inoculation (day 2) and also reported significant differences in the number and expression profiles of DE genes post inoculation in LS compared to PS. However, the DE analysis performed in those studies failed to identify significant differences in expression of genes between LS and PS before Salmonella challenge that could potentially explain or predict the varying responses between the two groups upon infection. This absence of DE could be due to the fact that the differences in expression of host resistance genes against Salmonella between LS and PS may be subtle, unlike the gene expression differences between non-infected and infected states, and thus requires a broader or more sensitive approach to identify those genes.
DE analysis typically focuses on identifying genes with the most significant differences between contrasting conditions. In addition, the requirement for multiple testing corrections may further impede the discovery of genes with subtle differences in expression. A powerful alternative approach for gene expression analysis is co-expression analysis, which considers the relationships between measured transcripts in an unsupervised way. The weighted gene co-expression network analysis (WGCNA) approach , one of the popular methods developed for gene co-expression analysis, effectively overcomes the multiple testing problems inherent in high throughput gene expression data analysis. This methodology begins with the identification of modules of highly correlated genes assessed by pair-wise correlations between gene expression profiles. Next the relationships between only a few tens of modules and the phenotypic trait of interest are considered. Finally candidate genes associated with the trait are prioritised based on network statistics like module membership and gene significance. WGCNA has been used to identify genes and gene networks associated with specific tissues, distinct biological states or diseases, and qualitative as well as quantitative phenotypes [22–24].
Here, we aimed to identify porcine genes and gene co-expression networks that differentiate distinct responses to Salmonella challenge with respect to faecal Salmonella shedding, using WGCNA and RNA-Seq data from whole blood.
Faecal Salmonella shedding counts
Shedding status of pigs determined based on AULC of faecal Salmonella shedding counts
RNA-Seq profiling of porcine whole blood expressed genes and their functional classification
RNA extracted from porcine peripheral blood samples collected at day 0 and day 2 post inoculation (p.i.) of Salmonella from 16 selected pigs identified as LS (n = 8) and PS (n = 8) were depleted for globin transcripts and subjected to Illumina sequencing. The sequencing depths achieved for these samples ranged from 23 to 52 million reads (mean = 37 ± 6), of which approximately 90% (mean = 33 ± 6) mapped to the pig genome (Additional file 1), identifying a total of 21,638 expressed genes. The efficiency of the globin depletion protocol varied across samples with the total globin reads (HBA + HBB) ranging from 0.2-60% of the total mapped reads post-depletion (Additional file 2). Based on data from a different set of samples (n = 12) that were used to develop the globin depletion protocol  (manuscript under preparation), it is known that the total globin reads constitute 26 to 54% (average 46.1 ± 8.6%) of the total mapped reads in blood of pigs between 3–7 weeks of age. Using this information on normal globin levels in pig blood, we estimate the globin depletion protocol to have worked well for 16 of the 32 samples used here where the total globin reads constituted below 15% of the total mapped reads. For 7 of the 32 samples, the protocol seemed to have worked to some extent, limiting total globin reads to between 20-30% of the total mapped reads. For the remaining 9 samples, however, the protocol either worked poorly (2 samples with total globins between 35-40%) or did not work at all (7 samples with >40% total globins). The differing efficiencies of globin depletion would have affected our ability to detect genes expressed at very low levels via improved coverage for non-globin genes. However, this would not affect the analysis we do here as we filter out the genes expressed at very low levels across all samples. The globin genes were also removed from the gene expression matrix prior to normalisation and further analysis. Removal of genes expressed at very low levels (keeping only genes with counts per million (CPM) per sample > 1 in at least 8 samples) resulted in a set of 10,872 genes expressed in porcine peripheral blood. A multi-dimensional scaling in two dimensions (see Methods) performed using this set of expressed genes revealed a clear separation of the samples by day (day 0 vs. day 2 p.i.) but not by shedding status (LS vs. PS) on either day (Additional file 3).
For a broad overview of the functions attributed to the genes expressed in porcine blood, we performed a functional classification based on gene ontology (GO) terms from GO Slim using the PANTHER gene list analysis tool (see Methods). In the biological processes category (Additional file 4), 26.3% of the blood-expressed genes were annotated to metabolic process, 16% to cellular process, 10.2% to cell communication and 7.7% to transport. In the molecular functions category, 33.8% of the genes were annotated to binding and 30.8% to catalytic activity whereas in the cellular composition category, the dominant term was intracellular (59.9%).
Gene co-expression network constructed using gene expression data from uninfected pigs
Modules associated with differences in faecal Salmonella shedding
To determine if any of the 30 modules of co-expressed genes identified in day 0 samples were associated with the observed differences in faecal Salmonella shedding counts upon future Salmonella challenge, we tested the correlations of the module eigengenes with the trait, AULC of faecal Salmonella shedding counts. Four modules were found significant at the defined cut-offs (absolute correlation > 0.3 and p-value < 0.1). Of these, the pink, grey60 and darkorange modules showed negative correlations with the trait while the magenta module showed a positive correlation. Figure 1B depicts the module sizes and correlations of the module eigengenes to the trait. The module membership (MM) versus gene significance (GS) plots for these modules (Figure 1C) showed that MM and GS are highly correlated, indicating that genes most significantly associated with the trait are often also the most important (central) elements of the respective modules. Here, MM is a measure of the strength of a particular gene’s membership in a module obtained by correlating the gene’s expression profile with the module eigengene of that module. Highly connected intramodular hub genes tend to have high MM values to the respective module. GS is a measure of the biological relevance of a gene with respect to the trait of interest obtained by correlating the gene’s expression profile with the trait. The gene expression profiles of genes within the four modules associated with Salmonella shedding, across all samples ordered by AULC of Salmonella shedding counts (Figure 1D), indicate tight co-regulation with an overall higher expression in LS than PS for the pink, grey60 and darkorange modules and a lower expression in LS than PS for the magenta module. Gene ontology enrichment tests (see Methods) revealed that the grey60 and pink modules were related to immune functions whereas the darkorange module was related to signalling processes (Benjamini-Hochberg corrected p-value < 0.05). The positively correlated magenta module was not significantly enriched for any GO term. The Ensembl IDs of the genes within these four modules are provided in Additional file 6 and the complete lists of GO terms enriched in these modules are provided in Additional file 7.
Salmonella shedding associated modules identified at day 0 are largely preserved at day 2
Differences in expression levels of Salmonella shedding associated genes between low and persistent shedders and between days
Differences in connectivity of Salmonella shedding associated genes between low and persistent shedders and between days
Prioritisation of candidate genes for Salmonella shedding
For prioritising candidate genes within the Salmonella shedding associated modules that could possibly determine better immune responses to Salmonella challenge, we restricted further analysis to the pink and grey60 modules. These modules were found to be immune-related, had a majority of genes significantly up-regulated at day 2, were well preserved with the corresponding modules from the network constructed using day 2 samples, and also showed overall higher gene connectivity measures in LS compared to PS in networks from both days. A total of 213 genes in the pink module and 100 in the grey60 module were up-regulated at day 2 p.i. Network properties of this refined set of candidate genes within these two modules are provided in Additional file 9. This candidate gene list includes genes previously reported to be involved in the regulation of host responses to Salmonella infection (SLC11A1, TLR4, CD14) or as having SNPs associated with Salmonella faecal shedding (CCR1), as well as those for which an association with Salmonella is novel (SIGLEC5, IGSF6 and TNFSF13B).
Top candidate genes associated with Salmonella shedding
Further, as an independent validation of our results, we compared the expression patterns of a subset of the candidate genes reported here with the corresponding expression patterns from a previous microarray based Salmonella challenge study involving a different set of LS and PS animals . Probes on the microarray represented 19 (FOLR1, SLC26A6, CDA, ANO10, ARG2, TLR2, TNFRSF1A, ALOX5AP, LREAP1, BMX, ZCCHC6, PGK1, CCR1, NUMB, SDCBP, SRGN, GNG10, LTB4R, RNF149) of the top 27 candidate genes reported in Table 2. Except for SLC26A6 and NUMB, all other genes showed expression patterns similar to those observed in our study: higher in LS than PS at day 0 and up-regulated in both LS and PS at day 2 p.i. In addition, the genes SLC11A1, TLR4 and CD14, which are known to play important roles during Salmonella infection, showed similar expression patterns as described above. Figures comparing the expression patterns of these 22 genes between the two studies are provided in Additional file 10.
Significant gains have been made in our understanding of host–pathogen interactions during Salmonella infection . Several studies, involving a variety of different species of farm and model animals, have investigated the host response to Salmonella infection and have successfully identified genes differentially expressed upon infection or gene variants and chromosomal loci associated with immune response traits during infection with Salmonella [7, 20, 27, 28]. A genetic basis for differences in resistance to Salmonella has also been shown with SLC11A1 (NRAMP1), the seminal example of a gene with genetic variants dramatically affecting resistance to bacterial infection [29, 30]. While many studies have looked into the host response to Salmonella infection [4, 20, 28], relatively few have focused on identifying the genes whose variable expression among different individuals may be associated with differences in Salmonella clearance and resistance.
Initial characterisation of the pigs used in an earlier Salmonella challenge study  revealed a significant positive correlation between serum interferon-γ (IFN-γ) levels at day 2 p.i. and faecal Salmonella shedding levels at day 2 and day 7 p.i. The same study also demonstrated that the peak of both clinical symptoms (fever, diarrhea, decreased appetite) and Salmonella shedding occurs at day 2 p.i. and that substantial whole blood transcriptome changes occur at day 2 p.i. compared to day 0 in pigs belonging to both LS and PS groups. Therefore, here we chose to profile whole blood transcriptomes at the same time-points, day 0 and day 2, but belonging to a different set of pigs, using RNA-Seq instead of microarrays, for a different purpose: to identify genes whose expression prior to inoculation is correlated with Salmonella shedding levels observed p.i. The genes identified may serve as blood-based candidate biomarkers that could potentially be used to develop quick screening tests for determining the host’s resistance/susceptibility to Salmonella infection and predicting their shedding characteristics early into or even before infection.
The extent of DE and degree of change in expression between day 0 and day 2 were, in general, higher in PS than LS. This finding, as previously speculated , may indicate that LS animals respond faster and more effectively against infection than PS so that by day 2, while the LS are returning back to normal, the PS are still actively fighting the infection. With this in mind we believe that the comparison between LS and PS at day 2 likely identifies DE due to differing levels of infection. A comparison at day 0, on the other hand, stands to better highlight genes responsible for differences in the efficacy of the initial response to the bacteria, assuming that some of these genes exhibit expression differences prior to infection. However, the DE analysis between LS and PS at day 0 did not yield any significant DE genes here as well as in an earlier Salmonella challenge study . This failure to find DE genes could be due to a combination of the subtlety of the expression differences, the relatively small sample size, and the strict multiple testing corrections. Hence we used an alternative approach, WGCNA, to find genes associated with Salmonella shedding.
Remarkably, the genes whose pre-inoculation expression profiles were found associated with post-inoculation Salmonella shedding levels included major genes already reported in literature as DE during Salmonella infection and involved with host resistance against Salmonella such as SLC11A1, TLR4, CD14 and CCR1 [4, 20]. Moreover, the majority of genes within the modules significantly associated with Salmonella shedding, following further refinement based on up-regulation and DE at day 2 (Additional file 9), were found to have an established or possible role in innate defense against bacterial/Salmonella infections. These include mainly the early innate immune response genes responsible for cytokine-cytokine receptor interactions (CXCL16, CCR1, CCR3, CNTF, CSF2RA, IFNGR1, TNFSF9, TNFSF12, TNFSF13, TNFSF13B, TNFRSF1A, TNFRSF1B, IL1A, IL15, IL18, IL1RAP, IL1R2, IL18RAP), genes involved in toll-like receptor pathway (TLR4, TLR2, MYD88, CD14, LY96), NF-Kappa B signalling pathway (NFKBIA, TNFRSF1A, TNFSF13B), NOD-like receptor signalling pathway (CASP1, PSTPIP1, IL18, NFKBIA) or otherwise linked to response to bacterial infections (SLC11A1, SERPINB1, S100A8, S100A9, S100A12, ARG2, CEBPB). Prioritisation of these genes based on gene significance, module membership and gene connectivity in LS at day 0 (Table 2) highlights some genes which have not been previously associated with Salmonella infection. For example, little is known about the role of SIGLEC5, a member of the Siglec family of sialic acid-binding lectins in host response to bacterial infection. However, it has been reported in humans that the absence of a functional SIGLEC14 (with which human SIGLEC5 shows extensive sequence similarity) results in attenuated cytokine response to some Gram-negative bacteria in null individuals .
The mean expression levels of the Salmonella shedding associated genes at day 0 were generally higher in LS than PS and mostly up-regulated in both at day 2 compared to day 0. In most instances, the expression was higher in PS than LS at day 2. We showed, at least for the top candidate genes reported here, that the pattern of expression is consistent with that from a previous microarray based Salmonella challenge study involving a different set of LS and PS animals. Examining the connectivities of genes within the Salmonella shedding associated modules in LS and PS, it became apparent that the genes in general showed higher connectivity in LS than PS, indicative of higher correlation/connection strengths with other network genes. The differences in connectivity measures for a set of genes between different conditions may signify differences in the co-ordination or strength of transcriptional regulation of that set of genes. Highly connected genes (hub genes) have been shown to play central roles in the biological processes that are represented by the module , and strong positive correlations have been reported between gene connectivity within the whole network and gene essentiality . Here, the significantly higher connectivity despite the lack of significant DE between LS and PS may be considered analogous to the results in a study on breast cancers of different histological grades . The authors of that study concluded that the differential connectivity patterns were not due to primary alterations of hub gene expression, but rather due to more subtle changes in expression of numerous genes interacting with those hubs. Further, they reported that complex epistatic interactions that underlie cellular functions might also be responsible for differences in network connectivity patterns as a function of a phenotypic trait. A study on aging in mice  reported a decreasing correlation of gene expression within genetic modules and attributed this to changes in expression of certain transcription factors (TF) as well as deterioration of chromatin structure with age. It is possible that genetic differences at mutiple levels as discussed above may contribute to the differences in strengths of coexpression and connectivity between LS and PS. Exploring these contributions may be a direction for future research.
One of the limitations of our study is the absence of samples from time points post-inoculation but before day 2 p.i., which are crucial to capture the early immune response during which the LS pigs have effectively managed the Salmonella challenge. Secondly, this study would benefit from a larger sample size, which would provide more power to detect the subtle changes in expression expected between LS and PS animals. Further experiments are required to rank the relative functional importance of our suggested candidate genes as contributors to distinct responses to Salmonella challenge with respect to faecal Salmonella shedding. However, the use of multiple criteria and strict cut-offs to refine the set of candidate genes, the agreement with existing literature regarding the immune related functions of many candidate genes and the concordance of the expression patterns of top candidate genes reported here with the corresponding expression patterns from an independent dataset, all lend further support to our predictions.
Our analysis integrates gene co-expression network analysis, gene-trait correlations and differential expression to provide new candidate regulators of Salmonella shedding in pigs with implications for future use as biomarkers for selection of animals with reduced susceptibility to, or carriage of, Salmonella or for predicting response to infection. The comparatively higher expression (also confirmed in an independent dataset) and the significantly higher connectivity of genes within the Salmonella shedding associated modules in LS compared to PS even before Salmonella challenge may be factors that contribute to the decreased faecal Salmonella shedding observed in LS following challenge.
Sample collection, Salmonella shedding status determination and selection of animals for RNA-Seq analysis
Samples used in this study were selected from Salmonella negative piglets (crossbred or Yorkshire sows bred to boars from different breeds) belonging to two populations of 40 and 77 individuals and treated as described in an earlier study from our group . In brief, the pigs were raised in climate-controlled, fully enclosed isolation facilities at the USDA-ARS-National Animal Disease Center (NADC) in Ames, IA under identical management conditions. At 7 weeks of age, these Salmonella negative pigs received intranasal inoculation of 109 colony-forming units (cfu) of nalidixic acid resistant Salmonella enterica serovar Typhimurium, ST χ4232. Approximately 2.3 ml of whole blood was collected from the jugular vein into PAXgene Blood RNA tubes (processed according to manufacturer’s instructions) from each individual just prior to Salmonella inoculation (day 0) and at days 2, 7, 14, and 20 p.i. On the same days, faecal samples were also collected and the amount of Salmonella bacteria shed in faeces was quantified by direct counting using bacteriological methods as described by Uthe et al. . All procedures involving animals were approved by the USDA-ARS-NADC Animal Care and Use Committee.
Salmonella shedding status was determined based on the total amount of Salmonella shed in faeces calculated using the cumulative area under the plotted log curve (AULC) of the logarithmically normalised faecal counts obtained between day 0 to day 20 post-inoculation for each individual as described in earlier publications [12, 20]. Based on the AULC, 16 pigs were selected for RNA-Seq analysis, eight of which were identified as LS and eight as PS.
RNA preparation, library construction and Illumina sequencing
Blood samples collected at day 0 and day 2 p.i. from 16 selected pigs identified as LS (n = 8) and PS (n = 8) were used for total RNA extraction. Total RNA was prepared from 4.5–9.0 ml of solution from the PAXgene Blood RNA tubes and extracted using the PAXgene Blood miRNA Kit (Qiagen) according to the manufacturer’s instructions. The DNA was removed by in-solution DNase I digestion and RNeasy MinElute Cleanup Kit as recommended by Qiagen. A PCR assay without reverse transcription was used to confirm that the RNA samples were DNA-free. The quantity and quality of the RNA were determined using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA) and Nanodrop 2000 (Thermo Scientific, Wilmington, DE).
A globin reduction protocol (GR) developed in-house  using porcine specific oligonucleotides was followed. A 10X GR oligonucleotide mix was prepared by combining the four oligos (two each for HBA and HBB; see Additional file 11 for details) to the final concentrations of 7.5 uM, 7.5 uM, 30 uM and 30 uM, respectively. Next, 2 uL of the 10X GR oligonucleotide mix was then added to 3 ug (7 uL) of total RNA and 1 uL of 10X oligonucleotide hybridisation buffer (100 mM Tris–HCl, pH 7.6; 200 mM KCl) to constitute the 10 uL of hybridisation mix. This mix was incubated in a thermal cycler at 70°C for 5 minutes and then cooled to 4°C. The RNA-DNA hybrids were digested with 2 U RNase H (Ambion) in the reaction buffer (100 mM Tris–HCl, pH 7.6, 20 mM MgCl2, 0.1 mM DTT, SUPERase-in) at 37°C for 10 minutes and cooled to 4°C. The reaction was stopped by addition of 0.5 M EDTA. The globin depleted RNA was immediately purified with the RNeasy MinElute Cleanup Kit (Qiagen, Toronto, Canada, Cat. No.: 74204) according to manufacturer’s instructions. RNA quality of the globin depleted samples was assessed using an Agilent Bioanalyzer 2100 (Agilent Technologies, Inc., Santa Clara, USA).
The mRNA library was constructed using the TruSeq RNA Sample Preparation Kit v2 (Illumina, Inc., San Diego, USA) according to manufacturer’s instructions. One ug of total RNA was purified using poly-T oligo-attached magnetic beads. The poly (A) RNA was primed and fragmented. The first strand and second strand cDNA were synthesised according to the Illumina sample preparation guide. The adapters with different indices were ligated to the cDNA fragments and the library was subsequently PCR-amplified. The qualities of the mRNA libraries were confirmed using the Agilent Bioanalyzer 2100 (Agilent Technologies, Inc.). Quantification was performed using the StepOne Real-Time PCR System (Applied Biosystems, Carlsbad, USA). The individual libraries were adjusted to 2 nM concentrations and pooled before denaturation and dilution according to Illumina’s instructions. The diluted libraries (8 – 10 pM) were loaded on a cBot (Illumina) for cluster generation using the TruSeq SR Cluster Kit v3 (Illumina). Sequencing was performed on the HiScanSQ system (Illumina) using the TruSeq SBS Kit v3 (50 cycles, single-read sequencing, from Illumina). Real-time analysis and base calling were performed using the HiSeq Control Software, version 1.4.8 (Illumina).
Identification and quantification of mRNAs
Sequence reads with base quality scores were produced by the Illumina sequencer. RNA-Seq reads flagged as low quality by the chastity filter in CASAVA 1.8 were removed. In addition, we removed reads with an average read quality score below 15 and reads in which over 5 of the last 10 bases had a PHRED quality score below 2. Pig reads were aligned to pig reference genome sequence assembly (Sscrofa10.2)  using Tophat 1.4.0  with default parameters which included: enable use of GTF file, set minimum anchor length of 8, accept zero mismatches in the anchor region, allow intron length between 50 and 500,000, and allow up to 20 alignments to the reference for a given read. For annotation of genes, we used the GTF file for Sscrofa10.2 from Ensembl version 67 . The number of reads uniquely mapped to each gene was determined using Htseq-count (v0.5.3.p3) . Reads that were assigned to more than one gene were not counted by Htseq-count. For further processing of the read counts, we used the Bioconductor (version 2.18.0)  package edgeR (version 3.0.8)  within the R (version 2.15.2) statistical programming language . The read counts per gene were normalised to counts per million (CPM). Genes expressed at very low levels were removed by keeping only those genes that achieve CPM above one in at least eight samples. Trimmed mean of M-values (TMM) normalisation was applied to this dataset to account for compositional differences between the libraries. Gene expression was also quantified as reads per kilobase of transcripts per million mapped reads (RPKM), calculated by dividing CPM by the respective gene lengths expressed in kilobase pairs. The gene lengths were taken as the length of the longest transcript for the respective genes obtained from the GTF file for Sscrofa10.2. CPM values were used for the DE analysis whereas RPKM values were used for the co-expression analysis. Clustering of samples by gene expression data (transformed to log2 CPM) was evaluated using multi-dimensional scaling in two dimensions using the plotMDS function in the limma (version 3.14.4)  package of Bioconductor. This function considers the Euclidian distance between each pair of samples for the top 500 genes with the largest standard deviations between samples i.e. the leading log2-fold-changes.
Differential expression and co-expression network analyses
DE analysis was performed using the Bioconductor package edgeR and gene co-expression network analysis was performed using the R package WGCNA (version 1.26) . The co-expression analysis starts by constructing a matrix of pairwise correlations between all pairs of genes across all selected samples. Next the matrix is raised to a soft-thresholding power (β = 8 in this study) to obtain an adjacency matrix. In order to identify modules of co-expressed genes, we construct the topological overlap-based dissimilarity [45, 46], which is then used as input to average linkage hierarchical clustering. This step results in a clustering tree (dendrogram) whose branches are identified for cutting depending on their shape using the dynamic tree-cutting algorithm  which has several advantages over the constant height cut-off method including the ability to identify nested clusters. Modules whose eigengenes are highly correlated are merged. The above steps were performed using the automatic network construction and module detection function (blockwiseModules in WGCNA), with the following major parameters: maxBlockSize of 11000, minModuleSize of 30, reassignThreshold of 0 and mergeCutHeight of 0.15. The robustness of the network modules was evaluated using a bootstrap approach implemented with functions available in WGCNA. Using the same parameters as described above, we performed 50 full module constructions and module detection runs on resampled datasets where in each run, 16 samples were selected randomly (with replacement) from the 16 day 0 samples. The module assignments in these runs were compared with the modules in the original network. Next, the modules were tested for their associations with the trait by correlating module eigengenes with trait measurements, AULC of faecal Salmonella shedding counts in this study. We used the softconnectivity function within the WGCNA package to calculate the connectivities of all genes in a network, which were then scaled to the maximum connectivity within that network. Differences between the scaled mean connectivities of selected genes of interest between different networks (constructed using subsets of the gene expression dataset) were tested using the Wilcoxon Rank Sum test. Detailed tutorials with examples for the use of the WGCNA method can be found at http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials.
The gene co-expression networks for selected modules of genes were visualised using Cytoscape (version 3.1.0) [48–50]. The following Bioconductor packages were used for gene annotations: biomaRt version (2.14.0) , AnnotationDbi (version 1.20.7) and org.Hs.eg.db (version 2.8.0). Gene ontology (GO)  term enrichment tests were performed for individual gene co-expression modules compared to a background set of all genes expressed in blood using the Bioconductor packages GOstats (version 2.26.0) and GSEABase (version 1.22.0). The human orthologs of the corresponding porcine genes were used in the GO enrichment tests to take advantage of the more complete GO annotation available for human genes. Functional classification of porcine blood expressed genes based on GO terms from GO Slim (a subset of GO terms that gives a broad overview of the GO ontology) was performed using the PANTHER gene list analysis tool (version 8.1, June 2013) [53, 54].
Availability of supporting data
This work is supported by the Applied Livestock Genomics Program (ALGP13) funded by Genome Alberta and the Alberta Livestock and Meat Agency, and the Large-Scale Applied Research Project, Application of Genomics to Improve Swine Health and Welfare, funded by Genome Canada, Genome Alberta and the Alberta Livestock and Meat Agency. Blood samples were provided from Salmonella challenge studies conducted at USDA-ARS-NADC in a project funded by USDA-NIFA-2009-35205-05192. Graham Plastow, Le Luo Guan, and Paul Stothard are grateful for the financial support of the Alberta Livestock and Meat Agency (project number 2010R097S). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Xu Sun is gratefully acknowledged for his role in the lab work and for editing the Methods section. We also thank the editor and the anonymous reviewers for their constructive comments and help in improving the manuscript.
- The Merck veterinary manual online. [http://www.merckmanuals.com/vet/index.html]
- Boyen F, Haesebrouck F, Maes D, Van Immerseel F, Ducatelle R, Pasmans F: Non-typhoidal Salmonella infections in pigs: a closer look at epidemiology, pathogenesis and control. Vet Microbiol. 2008, 130: 1-19. 10.1016/j.vetmic.2007.12.017.PubMedView Article
- Gopinath S, Carden S, Monack D: Shedding light on Salmonella carriers. Trends Microbiol. 2012, 20: 320-327. 10.1016/j.tim.2012.04.004.PubMedView Article
- Roy M-F, Malo D: Genetic regulation of host responses to Salmonella infection in mice. Genes Immun. 2002, 3: 381-393. 10.1038/sj.gene.6363924.PubMedView Article
- Calenge F, Kaiser P, Vignal A, Beaumont C: Genetic control of resistance to salmonellosis and to Salmonella carrier-state in fowl: a review. Genet Sel Evol. 2010, 42: 11-10.1186/1297-9686-42-11.PubMed CentralPubMedView Article
- Wigley P, Hulme SD, Bumstead N, Barrow PA: In vivo and in vitro studies of genetic resistance to systemic salmonellosis in the chicken encoded by the SAL1 locus. Microbes Infect. 2002, 4: 1111-1120. 10.1016/S1286-4579(02)01635-0.PubMedView Article
- Wigley P: Genetic resistance to Salmonella infection in domestic animals. Res Vet Sci. 2004, 76: 165-169. 10.1016/S0034-5288(03)00117-6.PubMedView Article
- Hasenstein JR, Zhang G, Lamont SJ: Analyses of Five gallinacin genes and the Salmonella enterica serovar Enteritidis response in poultry. Infect Immun. 2006, 74: 3375-3380. 10.1128/IAI.00027-06.PubMed CentralPubMedView Article
- Turner AK, Begon M, Jackson JA, Bradley JE, Paterson S: Genetic diversity in cytokines associated with immune variation and resistance to multiple pathogens in a natural rodent population. PLoS Genet. 2011, 7: e1002343-10.1371/journal.pgen.1002343.PubMed CentralPubMedView Article
- Lazzaro BP, Sceurman BK, Clark AG: Genetic basis of natural variation in D. melanogaster antibacterial immunity. Science. 2004, 303: 1873-1876. 10.1126/science.1092447.PubMedView Article
- Tinsley MC, Blanford S, Jiggins FM: Genetic variation in Drosophila melanogaster pathogen susceptibility. Parasitology. 2006, 132 (Pt 6): 767-773.PubMed CentralPubMedView Article
- Uthe JJ, Wang Y, Qu L, Nettleton D, Tuggle CK, Bearson SMD: Correlating blood immune parameters and a CCT7 genetic variant with the shedding of Salmonella enterica serovar Typhimurium in swine. Vet Microbiol. 2009, 135: 384-388. 10.1016/j.vetmic.2008.09.074.PubMedView Article
- Uthe JJ, Qu L, Couture O, Bearson SMD, O’Connor AM, McKean JD, Torres YR, Dekkers JCM, Nettleton D, Tuggle CK: Use of bioinformatic SNP predictions in differentially expressed genes to find SNPs associated with Salmonella colonization in swine. J Anim Breed Genet. 2011, 128: 354-365. 10.1111/j.1439-0388.2011.00935.x.PubMedView Article
- Sadeyen J-R, Trotereau J, Protais J, Beaumont C, Sellier N, Salvat G, Velge P, Lalmanach A-C: Salmonella carrier-state in hens: study of host resistance by a gene expression approach. Microbes Infect. 2006, 8: 1308-1314. 10.1016/j.micinf.2005.12.014.PubMedView Article
- Barreiro LB, Tailleux L, Pai AA, Gicquel B, Marioni JC, Gilad Y: Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc Natl Acad Sci U S A. 2012, 109: 1204-1209. 10.1073/pnas.1115761109.PubMed CentralPubMedView Article
- Arceo ME, Ernst CW, Lunney JK, Choi I, Raney NE, Huang T, Tuggle CK, Rowland RRR, Steibel JP: Characterizing differential individual response to porcine reproductive and respiratory syndrome virus infection through statistical and functional analysis of gene expression. Front Genet. 2012, 3: 321-PubMed CentralPubMed
- Lunney JK, Chen H: Genetic control of host resistance to porcine reproductive and respiratory syndrome virus (PRRSV) infection. Virus Res. 2010, 154: 161-169. 10.1016/j.virusres.2010.08.004.PubMedView Article
- Chaussabel D, Pascual V, Banchereau J: Assessing the human immune system through blood transcriptomics. BMC Biol. 2010, 8: 84-10.1186/1741-7007-8-84.PubMed CentralPubMedView Article
- Mach N, Gao Y, Lemonnier G, Lecardonnel J, Oswald IP, Estellé J, Rogel-Gaillard C: The peripheral blood transcriptome reflects variations in immunity traits in swine: towards the identification of biomarkers. BMC Genomics. 2013, 14: 894-10.1186/1471-2164-14-894.PubMed CentralPubMedView Article
- Huang T-H, Uthe JJ, Bearson SMD, Demirkale CY, Nettleton D, Knetter S, Christian C, Ramer-Tait AE, Wannemuehler MJ, Tuggle CK: Distinct peripheral blood RNA responses to Salmonella in pigs differing in Salmonella shedding levels: intersection of IFNG, TLR and miRNA pathways. PLoS One. 2011, 6: e28768-10.1371/journal.pone.0028768.PubMed CentralPubMedView Article
- Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008, 9: 559-10.1186/1471-2105-9-559.PubMed CentralPubMedView Article
- Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S: Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome. 2007, 18: 463-472. 10.1007/s00335-007-9043-3.PubMed CentralPubMedView Article
- Kommadath A, Te Pas MFW, Smits MA: Gene coexpression network analysis identifies genes and biological processes shared among anterior pituitary and brain areas that affect estrous behavior in dairy cows. J Dairy Sci. 2013, 96: 2583-2595. 10.3168/jds.2012-5814.PubMedView Article
- Park CC, Gale GD, De Jong S, Ghazalpour A, Bennett BJ, Farber CR, Langfelder P, Lin A, Khan AH, Eskin E, Horvath S, Lusis AJ, Ophoff RA, Smith DJ: Gene networks associated with conditional fear in mice identified using a systems genetics approach. BMC Syst Biol. 2011, 5: 43-10.1186/1752-0509-5-43.PubMed CentralPubMedView Article
- Choi I, Hosseini A, Hua B, Kommadath A, Sun X, Meng Y, Stothard P, Plastow GS, Tuggle CK, Reecy JM, Fritz E, Abrams SM, Lunney JK, Guan LL: Globin Reduction in Porcine Whole Blood for Improving Sensitivity and Accuracy of Transcriptome Analysis [Abstract]. Proceedings of the International Plant and Animal Genome XXI: 12-16 January 2013. 2013, San Diego, CA USA, P0595-[https://pag.confex.com/pag/xxi/webprogram/Paper7997.html]
- Langfelder P, Luo R, Oldham MC, Horvath S: Is my network module preserved and reproducible?. PLoS Comput Biol. 2011, 7: e1001057-10.1371/journal.pcbi.1001057.PubMed CentralPubMedView Article
- Uthe JJ, Royaee A, Lunney JK, Stabel TJ, Zhao S-H, Tuggle CK, Bearson SMD: Porcine differential gene expression in response to Salmonella enterica serovars Choleraesuis and Typhimurium. Mol Immunol. 2007, 44: 2900-2914. 10.1016/j.molimm.2007.01.016.PubMedView Article
- Galina-Pantoja L, Siggens K, Van Schriek MGM, Heuven HCM: Mapping markers linked to porcine salmonellosis susceptibility. Anim Genet. 2009, 40: 795-803. 10.1111/j.1365-2052.2009.01916.x.PubMedView Article
- Govoni G, Gros P: Macrophage NRAMP1 and its role in resistance to microbial infections. Inflamm Res. 1998, 47: 277-284. 10.1007/s000110050330.PubMedView Article
- Wick MJ: Innate immune control of Salmonella enterica serovar Typhimurium: mechanisms contributing to combating systemic Salmonella infection. J Innate Immun. 2011, 3: 543-549. 10.1159/000330771.PubMedView Article
- Yamanaka M, Kato Y, Angata T, Narimatsu H: Deletion polymorphism of SIGLEC14 and its functional implications. Glycobiology. 2009, 19: 841-846. 10.1093/glycob/cwp052.PubMedView Article
- Langfelder P, Mischel PS, Horvath S: When is hub gene selection better than standard meta-analysis?. PLoS One. 2013, 8: e61505-10.1371/journal.pone.0061505.PubMed CentralPubMedView Article
- Carlson MRJ, Zhang B, Fang Z, Mischel PS, Horvath S, Nelson SF: Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics. 2006, 7: 40-10.1186/1471-2164-7-40.PubMed CentralPubMedView Article
- Chu J, Lazarus R, Carey VJ, Raby BA: Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes. BMC Syst Biol. 2011, 5: 89-10.1186/1752-0509-5-89.PubMed CentralPubMedView Article
- Southworth LK, Owen AB, Kim SK: Aging mice show a decreasing correlation of gene expression within genetic modules. PLoS Genet. 2009, 5: e1000776-10.1371/journal.pgen.1000776.PubMed CentralPubMedView Article
- Bao H, Kommadath A, Plastow GS, Tuggle CK, Guan LL, Stothard P: MicroRNA buffering and altered variance of gene expression in response to Salmonella infection. PLoS One. 2014, 9: e94352-10.1371/journal.pone.0094352.PubMed CentralPubMedView Article
- Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, et al: Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012, 491: 393-398. 10.1038/nature11622.PubMed CentralPubMedView Article
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralPubMedView Article
- Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt S, Johnson N, Juettemann T, Kähäri AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, et al: Ensembl 2014. Nucleic Acids Res. 2014, 42 (Database issue): D749-D755.PubMed CentralPubMedView Article
- Anders S, Pyl PT, Huber W: HTSeq - A Python framework to work with high-throughput sequencing data. bioRxiv. 2014, http://dx.doi.org/10.1101/002824,
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralPubMedView Article
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093/bioinformatics/btp616.PubMed CentralPubMedView Article
- R Core Team: R: A Language and Environment for Statistical Computing. 2013, Vienna, Austria: R Foundation for Statistical Computing, URL http://www.R-project.org/
- Smyth GK: Limma: Linear Models for Microarray Data. Bioinforma Comput Biol Solut Using R Bioconductor. Edited by: Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. 2005, New York: Springer, 397-420.View Article
- Zhang B, Horvath S: A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005, 4: 17-
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL: Hierarchical organization of modularity in metabolic networks. Science. 2002, 297: 1551-1555. 10.1126/science.1073374.PubMedView Article
- Langfelder P, Horvath S: Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol. 2007, 1: 54-10.1186/1752-0509-1-54.PubMed CentralPubMedView Article
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMed CentralPubMedView Article
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang P-L, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, et al: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007, 2: 2366-2382. 10.1038/nprot.2007.324.PubMed CentralPubMedView Article
- Smoot ME, Ono K, Ruscheinski J, Wang P-L, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27: 431-432. 10.1093/bioinformatics/btq675.PubMed CentralPubMedView Article
- Durinck S, Spellman PT, Birney E, Huber W: Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009, 4: 1184-1191. 10.1038/nprot.2009.97.PubMed CentralPubMedView Article
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. the Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralPubMedView Article
- Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD: The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 2005, 33 (Database issue): D284-D288.PubMed CentralPubMedView Article
- Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A: PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003, 13: 2129-2141. 10.1101/gr.772403.PubMed CentralPubMedView Article
- Rustici G, Kolesnikov N, Brandizi M, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Ison J, Keays M, Kurbatova N, Malone J, Mani R, Mupo A, Pedro Pereira R, Pilicheva E, Rung J, Sharma A, Tang YA, Ternent T, Tikhonov A, Welter D, Williams E, Brazma A, Parkinson H, Sarkans U: ArrayExpress update–trends in database growth and links to data analysis tools. Nucleic Acids Res. 2013, 41 (Database issue): D987-D990.PubMed CentralPubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.