Integrative genomics of the mammalian alveolar macrophage response to intracellular mycobacteria

Background Bovine TB (bTB), caused by infection with Mycobacterium bovis, is a major endemic disease affecting global cattle production. The key innate immune cell that first encounters the pathogen is the alveolar macrophage, previously shown to be substantially reprogrammed during intracellular infection by the pathogen. Here we use differential expression, and correlation- and interaction-based network approaches to analyse the host response to infection with M. bovis at the transcriptome level to identify core infection response pathways and gene modules. These outputs were then integrated with genome-wide association study (GWAS) data sets to enhance detection of genomic variants for susceptibility/resistance to M. bovis infection. Results The host gene expression data consisted of RNA-seq data from bovine alveolar macrophages (bAM) infected with M. bovis at 24 and 48 h post-infection (hpi) compared to non-infected control bAM. These RNA-seq data were analysed using three distinct computational pipelines to produce six separate gene sets: 1) DE genes filtered using stringent fold-change and P-value thresholds (DEG-24: 378 genes, DEG-48: 390 genes); 2) genes obtained from expression correlation networks (CON-24: 460 genes, CON-48: 416 genes); and 3) genes obtained from differential expression networks (DEN-24: 339 genes, DEN-48: 495 genes). These six gene sets were integrated with three bTB breed GWAS data sets by employing a new genomics data integration tool—gwinteR. Using GWAS summary statistics, this methodology enabled detection of 36, 102 and 921 prioritised SNPs for Charolais, Limousin and Holstein-Friesian, respectively. Conclusions The results from the three parallel analyses showed that the three computational approaches could identify genes significantly enriched for SNPs associated with susceptibility/resistance to M. bovis infection. Results indicate distinct and significant overlap in SNP discovery, demonstrating that network-based integration of biologically relevant transcriptomics data can leverage substantial additional information from GWAS data sets. These analyses also demonstrated significant differences among breeds, with the Holstein-Friesian breed GWAS proving most useful for prioritising SNPS through data integration. Because the functional genomics data were generated using bAM from this population, this suggests that the genomic architecture of bTB resilience traits may be more breed-specific than previously assumed. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07643-w.

and exploit innate immune responses, thereby facilitating colonisation, persistence and replication within host macrophages [29][30][31][32]. These mechanisms include: recruitment of cell surface receptors on the host macrophage through molecular mimicry; restricting phagosome maturation and autophagy; detoxification of reactive oxygen species and reactive nitrogen intermediates (ROSs and RNIs); modulation of type I interferon (IFN) signalling; suppression of antigen presentation; rewiring and short-circuiting of macrophage signal transduction pathways; manipulation of host macrophage metabolism; egress of bacilli into the macrophage cytosol; and inhibition of apoptosis with concomitant induction of necrosis leading to immunopathology and shedding by the host to complete the pathogenic life cycle [33][34][35][36][37][38][39]. Hence after infection, a two-way response is triggered between the pathogen and macrophage, the outcome of which ultimately leads to establishment of infection or clearance of the pathogen. The latter outcome of clearance may, or may not, require engagement of the adaptive immune system. As the detection of M. bovis infection in cattle generally relies on detecting an adaptive immune response to the pathogen, the outcome of which is slaughter of positive animals ('reactors'), identifying genes that underpin efficacious innate responses promises to reveal favourable genomic variants for incorporation into breeding programmes.
Since 2005, substantial efforts have been made to better understand host-pathogen interaction for bTB using transcriptomics technologies such as gene expression microarrays and RNA sequencing (RNA-seq) at the host cellular level-specifically the bovine alveolar macrophage (bAM) and initial innate immune responses to infection by M. bovis [40][41][42][43][44][45][46]. These studies have helped to define a "pathogenic signature" [31,47] of M. bovis infection in bAM, which reflects the tension between macrophage responses to contain and kill intracellular pathogens and evasion and avoidance mechanisms evolved by these mycobacteria. Using functional genomics data mining of transcriptomics data, it has also been shown that bAM responses to M. bovis infection can be clearly differentiated from infection with M. tuberculosis, the primary cause of human TB [45]. In addition, these studies have been expanded to encompass surveys of the bAM epigenome using methylome sequencing and chromatin immunoprecipitation sequencing (ChIP-seq). This work has demonstrated that the transcriptional reprogramming of bAM caused by M. bovis infection is profoundly shaped by chromatin remodelling at gene loci associated with critical components of host-pathogen interaction [46,48].
In parallel to functional genomics studies of bTB, genome-wide association studies (GWAS) have been performed in Irish and UK cattle populations using estimated breeding values (EBVs; estimate of genetic merit of an animal derived from a statistical model) for several M. bovis infection resistance traits with heritabilities ranging from 0.04 to 0.37, depending on the phenotype used [49][50][51][52][53][54][55]. These GWAS have used medium-and high-density single-nucleotide polymorphism (SNP) arrays and, more recently, imputed whole-genome sequence (WGS) data sets for a large multi-breed GWAS on 7346 bulls, which identified 64 quantitative trait loci (QTLs) associated with resistance to M. bovis infection [55]; the association study was based on phenotypic data from 781,270 individual animals.
We have recently shown that integration of bAM functional genomics data sets-RNA-seq, microRNA-seq and ChIP-seq-with a GWAS data set for resistance to M. bovis infection can be used to enhance detection of genomic regions associated with reduced incidence of bTB disease [46]. For the present study, we substantially expand this work by leveraging gene-focused networkand pathway-based methods under a statistical framework based on a new software tool, gwinteR, to integrate transcriptomics data from M. bovis-challenged bAM with WGS-based GWAS results for resistance to M. bovis infection [55]. Functional genomics data and downstream data mining (e.g., to generate lists of differentially expressed genes and outputs from network and pathway analyses) can be used to obtain prioritised subsets of genes that are likely to be important for a specific biological process or phenomenon [56,57]. The gwinteR tool can leverage these prioritised gene subsets and combine them with summary statistics from biologically relevant GWAS data sets. Biologically meaningful SNPphenotype associations can therefore be identified and enriched that would otherwise be filtered out because of stringent multiple test correction for the very large numbers of observations in a typical GWAS. The primary aim of this work was to evaluate whether this approach can systematically enhance detection of genomic sequence variants and genes underpinning bTB disease resistance in cattle populations.

Differential gene expression and pathway analyses of M. bovis-infected bovine AM
Quality filtering of RNA-seq read pairs yielded a mean of 22,681,828 ± 3,508,710 reads per individual library (n = 78 libraries). A mean of 19,582,959 ± 3,021,333 read pairs (86.17%) were uniquely mapped to locations in the ARS-UCD1.2 bovine genome assembly. Detailed filtering and mapping statistics are provided in a data file available from the Dryad Digital Repository (https://doi.org/ 10.5061/dryad.83bk3j9q6) and multivariate PCA analysis of the individual animal sample expression data using DESeq2 revealed separation of the control and M. bovisinfected bAM groups at the 24 and 48 h post infection (hpi) time points, but not at the 2 and 6 hpi time points (S1 Fig).
Using default criteria for differential expression (FDR P adj. < 0.05; |log 2 FC| > 1), and considering the M. bovisinfected bAM relative to the control non-infected macrophages, three DE genes were detected at 2 hpi (all three exhibited increased expression in the M. bovis-infected group); 97 DE genes were detected at 6 hpi (40 increased and 57 decreased); 1345 were detected at 24 hpi (764 increased and 581 decreased); and 2915 were detected at 48 hpi (1528 increased and 1387 decreased) ( Fig. 2a; Additional file 1). Figure 2b shows that 2982 genes were differentially expressed across the 24 and 48 hpi time points. Table 1 shows a breakdown of DE genes across the infection time course for a range of statistical thresholds and fold-change cut-offs, including the default criteria (FDR P adj. < 0.05; |log 2 FC| > 1). To ensure manageable computational loads, the DE gene sets that were used for GWAS integration with gwinteR were filtered with |log 2 FC| > 2, and P adj. < 0.01 and P adj. < 0.000001 for 24 and 48 hpi, respectively. With these criteria, there were 378 input genes for GWAS integration identified at 24 hpi and 390 input genes at 48 hpi. (24 hpi and 48 hpi DEG gene sets). In addition, 210 genes overlapped between the two time points. The two DEG gwinteR input gene sets (DEG-24 and DEG-48see Fig. 1) are also detailed in Additional file 1.
To produce gene sets for the IPA Core Analysis within the recommended range for the number of input entities [58,59]   Recognition of Bacteria and Viruses, Communication between Innate and Adaptive Immune Cells, TNFR2 Signalling, Role of PI3K/AKT Signalling in the Pathogenesis of Influenza, IL-10 Signalling, and Toll-like Receptor Signalling. The SIGORA software tool [60] has been previously used to identify biological pathways associated with a robust 'core' bAM response to infection with both M. bovis and M. tuberculosis [45]. It is therefore reassuring that many of these pathways-including PI3K-Akt Signalling Pathway, RIG-I-like Receptor Signalling Pathway, Toll-like Receptor Signalling and Protein Ubiquitination Pathway-were also enriched using the IPA methodology at 24 and 48 hpi.

Differential co-expression correlation networks and identification of functional gene modules
For the generation of bAM differential co-expression correlation networks, filtering of genes with low measure of central tendency, which reduces the number of potential spurious correlations [61], resulted in 11,354 and 11,170 genes at 24 and 48 hpi, respectively. Following this step, differential correlation analysis using DGCA with an empirical P adj. value threshold of 0.10 resulted in 3507 differentially correlated gene pairs out of 128,913,316 total pairwise correlations at 24 hpi; and 1135 from a total of 124,768,900 at 48 hpi (Additional file 3). The correlation networks generated at 24 hpi and 48 hpi (Fig. 3a) yielded a total of 22 and 14 functional gene modules, respectively ( Fig. 3b and c, and Additional file 3). After removal of duplicates, consolidated totals of 460 genes and 416 genes were contained in the functional modules at 24 hpi and 48 hpi, respectively. There were also 26 genes that overlapped between the functional modules for the two time points. The two correlation network (CON) gwinteR input gene sets (CON-24 and CON-48see Fig. 1) are also detailed in Additional file 3. GO term enrichment was also performed for each functional module at 24 hpi and 48 hpi, with the top three GO terms retained for each functional module (S2 Fig and S3 Fig)
The large GIN produced by InnateDB starting with the input list of 258 functionally prioritised genes was visualised using Cytoscape and consisted of 7001 nodes (individual genes) and 19,713 edges (gene interactions) ( The genes contained in the top five modules at 24 hpi and 48 hpi were filtered to remove duplicates and consolidated into two separate gene sets for GWAS integration with gwinteR. The consolidated gene sets for the top five modules at 24 hpi and 48 hpi contained 339 and 495 unique genes, respectively. There were 245 genes that overlapped between the two subnetwork gene sets for the two post-infection time points. The two differential expression network (DEN) gwinteR input gene sets (DEN-24 and DEN-48see Fig. 1) are also detailed in Additional file 4.

GWAS integration and identification of additional SNPtrait associations
The six gene sets generated from the three separate analyses of DE genes in bAM challenged with M. bovis at 24 hpi and 48 hpi are summarised in Table 2    48 hpi, respectively. In addition to these six putative functionally relevant gene sets, one hundred sets of 250 genes randomly sampled from the bovine genome were used for statistical context and comparison. These random gene sets (RAN) are detailed in Additional file 5 (see also Fig. 1). The results from the integrative analyses using gwinteR with the DEG-24, DEG-48, CON-24, CON-48, DEN-24, DEN-48, and RAN gene sets are summarised graphically in Fig. 5 and detailed in Additional files 6 and 7. We also used DE gene sets at 2 and 6 hpi with the default statistical threshold and foldchange cut-offs (FDR P adj. < 0.05; |log 2 FC| > 1) for GWAS integration using gwinteR (i.e., DEG-2 and DEG-6). However, no significant SNP enrichment were observed for these input gene sets, most likely because of the low numbers of input genes (3 and 97, respectively). Figure 5a shows circular Manhattan plots with GWAS results (P adj. values) for each of the three breeds prior to data integration using gwinteR. Figure 5b shows the gwinteR permuted P-values (P perm. ) for each of the 10 genomic intervals used and for each of the six input gene sets plus the RAN gene set with P adj. < 0.10. Figure 5c shows circular Manhattan plots with GWAS results post data integration using gwinteR. Inspection of Fig. 5b shows that, in terms of SNP enrichment (P perm. < 0.05), the integrative analyses using gwinteR were most effective for the HOF breed group where the CON-24, CON-48, and DEG-48 input gene sets produced enriched SNPs across all 10 genomic ranges. In addition, the DEG-24 and DEN-24 input gene sets were effective for the HOF breed across the ±20 to 100 kb and ± 30 to 50 kb genomic ranges, respectively. In the case of the LIM breed, the DEG-48 input gene set produced enriched SNPs across all 10 genomic ranges, the CON-48 between ±10 to 70 kb and the CON-24 at ±10 kb. For the CHA breed, SNP enrichment using gwinteR was only observed for the CON-24 input gene set for the genomic interval between ±10 to 40 kb. Figure 6 summarises the numbers of statistically significant SNPs pre-and post-data integration, again with SNP enrichment being most evident for the HOF breed with a 24-fold post-gwinteR SNP enrichment at P adj. < 0.10 from 40 to 961 SNPs. The SNP enrichments for the CHA and LIM breed were more modest, although there was a 2.3-fold enrichment at P adj. < 0.10 from 80 to 182 SNPs for the LIM breed. Inspection of Additional file 7 reveals notable gene loci associated with enriched GWAS SNPs for the HOF breed including the allograft inflammatory factor 1 gene (AIF1), which encodes a protein that promotes macrophage activation and proinflammatory activity [63]; the ciliogenesis associated kinase 1 gene (CILK1 aka ICK); IL17A and IL7F, which encode proinflammatory cytokines and contain polymorphisms in the human orthologs that have been associated with lower human TB disease incidence [64,65]; the integrin subunit beta 3 gene (ITGB3), which encodes a protein that has been shown to regulate matrix metalloproteinase secretion in pulmonary human TB [66]; the neuraminidase 1 gene (NEU1), which encodes a protein that regulates phagocytosis in macrophages [67]; and the TNF gene that encodes TNF (aka TNF-α), a key cytokine for generation and maintenance of the granuloma, and where gene polymorphisms have been linked to resistance to M. bovis infection [68]. Gene loci associated with enriched GWAS SNPs for the CHA breed group also included the CILK1 gene (aka ICK) and the Kelch repeat and BTB domain containing 3 gene (KCNJ15), which has been detected as an expression biomarker for human TB [69]; the T cell immune regulator 1, ATPase H+ transporting V0 subunit a3 gene (TCIRG1), a known antimycobacterial host defence gene that has been shown to be a key hub gene associated with IFN-γ stimulation of human macrophages [70]; and the Von Willebrand factor gene (VWF). Notable gene loci associated with enriched GWAS SNPs for the Limousin breed group included the cellular repressor of E1A stimulated genes 1 gene (CREG1), which encodes a regulator of core macrophage differentiation genes [71]; the desmoplakin gene (DSP), which increases in expression during M. tuberculosis-derived ESAT6-regulated transition of bone marrow-derived macrophages (BMDMs) into epithelioid macrophages [72]; and the SP110 nuclear body protein gene (SP110), which encodes a protein that modulates growth of MTBC pathogens in macrophages and has been successfully exploited for genome editing of cattle to enhance resistance to M. bovis infection [73].

Discussion
During the last decade, integrative genomics, multiomics analyses and network biology have come to the fore as powerful strategies for exploring, dissecting . For the second method, a correlation network approach was used to identify subnetworks (modules) of co-expressed functional gene clusters from the bAM transcriptomics data at the two post-infection time points (CON-24 and CON-48). The third method was also network-based but took advantage of the extensive scientific literature and curated biomolecular data for mycobacterial Integration of these six input gene sets with bTB GWAS data from three different cattle breeds (CHA, LIM and HOF) revealed substantial differences among the three methods in their capacity to detect additional QTLs for a M. bovis infection resistance trait in these particular GWAS data sets. For example, the correlation network approach was the only method that enriched SNPs for the CHA breed and that worked for at least one bAM post-infection time point across all three breeds (see Fig. 5). Surprisingly, perhaps, the functional modules obtained using the GIN differential network (DEN) approach-the most complex method to implement-produced the least effective input gene sets for prioritising additional SNPs from the GWAS data set. This method proved effective only for the DEN-24 input gene set with the HOF breed and then only for the ±30, ±40, and ± 50 kb genomic intervals. Conversely, the simplest method based on DE genes at 24 and 48 hpi enriched SNPs for both the LIM and HOF breed groups, with the DEG-48 input gene set being the most effective (Fig. 5). The relatively poor performance of the DEN approach for integration of functional genomics and GWAS data may be a consequence of the humanfocused GeneCards and InnateDB resources we used to generate the base GIN [62,78]. Therefore, it would be instructive to conduct similar integrative analyses for human TB using appropriate functional genomics and GWAS data sets. In summary, across the two different bAM infection time points and the three breeds, the CON method enriched 970 SNPs, the DEG method enriched 163 SNPs and the DEN method enriched only 11 SNPs (Additional file 7). Although other factors such as linkage disequilibrium need to be considered in interpreting these differences, it is reasonable to hypothesise that GWAS integration using the correlation network approach is more sensitive to regulatory genomic variants that alter expression of co-ordinately regulated protein components of the alveolar macrophage pathways and processes underpinning host-pathogen interaction for the early stages of intracellular MTBC infection [27,28,30]. Figure 7 illustrates this using the example of SNPs within and proximal to gene loci associated with PI3K/ AKT signalling, which based on IPA data mining for bAM DE genes was a highly activated pathway, particularly at 24 hpi. In this regard, previous work has shown that macrophage PI3K/AKT signalling is key to a range of cellular processes associated with host-pathogen interaction in MTBC infections, including modulation of cell death pathways, manipulation of signalling downstream of TLRs, and initiation of granuloma formation [79][80][81][82][83][84]. We have also recently demonstrated that genes encoding protein products embedded in the PI3K/AKT pathway are primary targets for chromatin modifications that substantially alter bAM gene expression in response to M. bovis infection [46].
There were also notable breed differences in the effectiveness of the three methods used for multi-omics data integration and enrichment of GWAS SNPs. The original GWAS data set for the HOF breed was the least powered of the three breeds in terms of both sample size (n = 1502 sires) and genetic markers (12,740,315 genome-wide SNPs). This is reflected in the relatively small number of GWAS SNPs that were detected pre-integration for the HOF breed: 40 SNPs at P adj. < 0.10 compared to 475 for the CHA breed and 80 for the LIM breed (see Fig. 6). However, the SNP enrichment post-integration was markedly more effective for the HOF breed with a 24-fold increase from 40 to 961 SNPs (P adj. < 0.10) compared to 1.08-fold (475 to 511 SNPs; P adj. < 0.10) and 2.28-fold (80 to 182 SNPs; P adj. < 0.10) for the CHA and LIM breeds, respectively (Fig. 6). In addition, a total of 48 genes were captured by the enriched GWAS SNPs for the HOF breed compared to 20 and 16 genes for the CHA and LIM breeds, respectively (Additional file 7).
Regarding the enhanced SNP detection observed for the HOF breed GWAS after multi-omic integration, it is noteworthy that the M. bovis infection transcriptomics data set was generated using bAM from this population [43]. Most of the SNPs (n = 829, 88%) and genes (n = 36, 67%) obtained post-integration in the HOF breed were detected using the correlation co-expression network method, which, again, is likely caused by enrichment of genomic regulatory variants that modulate expression of genes associated with alveolar macrophage pathways and processes critical to early host-pathogen interaction. This pattern of variation may also reflect the polygenic architecture of M. bovis infection resistance in domestic cattle [52,54,55,85].
It is well established that intracellular mycobacterial pathogens use a host of evolved mechanisms to survive within, and ultimately disseminate from host macrophage cells. Several of these mechanisms are ultimately linked to regulators of TNF and cognate downstream signalling pathways [29,30,39]. It is noteworthy, therefore, that our analysis prioritised SNPs at the TNF gene locus, a key nexus for host-pathogen interaction in mycobacterial infections. In addition, another important component of host-pathogen interaction and target for immunoevasion by intracellular mycobacterial infections is the PI3K/AKT pathway [33,79,83] and our integrative genomics approach was able to prioritise genomic variation at genes encoding ligands and receptors within this signalling network, specifically TNXB, GNG11, ITGB3, and VWF (see Fig. 7).

Conclusions
Elucidation of the mechanisms used by M. bovis to establish infection in cattle, and ultimately cause disease, requires an intimate knowledge of host-pathogen interactions, especially at the interface between the bAM and the invading bacilli. Using a systems biology approach, we identified and catalogued patterns of gene expression and gene-gene interactions that occur in bAM during the initial stages of M. bovis infection, highlighting key response pathways and hub genes. Additionally, a new R package, gwinteR, facilitated integration of these functional genomics outputs with three large breed based GWAS data sets for resistance/susceptibility to M. bovis infection. This revealed genomic variants associated with resistance in key innate immune genes and supported the hypothesis that the response to M. bovis infection at the level of the alveolar macrophage may be more breed-specific than previously assumed.
Integrative multi-omics approaches to data integration are now widely used to explore and dissect the genomic architecture and physiological basis of complex traits in domestic livestock, including networkbased methods to integrate functional genomics and GWAS data [86][87][88][89][90][91]. However, to the best of our knowledge, this study is the first that uses network biology to systematically combine transcriptomics data from M. bovis-infected macrophages with GWAS data for M. bovis infection resistance in cattle. Therefore, it provides a novel framework for integrative genomics studies of complex infectious disease resistance traits in livestock, particularly those involving other intracellular bacterial pathogens such as Brucella abortus, Mycobacterium avium subsp. paratuberculosis and Salmonella enterica. The work is also relevant to development of methods for integrative analyses of outputs from the Functional Annotation of Animal Genomics (FAANG) initiative [92] and for identification and prioritization of targets for genome editing to enhance resistance to infection in domestic livestock species [93]. The results from this study may also inform genomeenabled breeding programmes for resistance to M. bovis infection in production cattle populations [85,94]. Finally, this integrative multi-omics approach could also be used to combine relevant functional genomics and GWAS data sets to improve knowledge of innate immune responses and establishment of infection in human TB caused by M. tuberculosis.

Genomics data acquisition and computational and bioinformatics workflow
Genome-wide RNA-seq transcriptomics data from a 48-h bAM time course challenge experiment using the sequenced M. bovis AF2122/97 strain have been previously generated by our group (GEO accession: GSE62506). The complete laboratory methods used to isolate, culture and infect bAM with M. bovis AF2122/9 and generate strand-specific RNA-seq libraries using RNA harvested from these cells are described in detail elsewhere [42,43,45]. Briefly, these RNA-seq data were generated using bAM obtained by lung lavage of ten unrelated age-matched 7 to 12week-old male Holstein-Friesian calves. Bovine AM were either infected in vitro with M. bovis AF2122/97 or incubated with media only. Following total RNA extraction from M. bovis-infected and control noninfected alveolar macrophages, 78 strand-specific RNA-seq libraries were prepared (paired-end 2 × 90 nucleotide reads). These comprised M. bovis-and non-infected samples from each post-infection time point (2, 6, 24 and 48 hpi) across 10 animals with the exception of one animal that did not yield sufficient alveolar macrophages for in vitro infection at 48 hpi. GWAS data sets for the present study were obtained from intra-breed imputed WGS-based GWAS analyses that used estimated breeding values (EBVs) derived from a bTB infection phenotype, which were generated for 2039 Charolais, 1964 Limousin and 1502 Holstein-Friesian sires [55]. The bTB phenotype, the WGS-based imputed SNP data, and the quantitative genetics methods are described in detail elsewhere [55]; however, the following provides a brief summary. The bTB infection phenotype was defined for every animal present during each herd-level bTB breakdown when a bTB reactor or a slaughterhouse case was identified. Cattle that yielded a positive single intradermal comparative tuberculin test (SICTT), and/or post-mortem lymph node lesion, or laboratory culture result/s were coded as bTB = 1 and all other cattle present in the herd during the bTB-breakdown were coded as bTB = 0; potential exposure of cattle within the bTB breakdown was also considered in this study [55]. After phenotype data edits, bTB resistance EBVs were generated for 781,270 phenotyped cattle (plus their recorded ancestors). After within-breed SNP filtering using thresholds for minor allele frequency (MAF ≤ 0.002) and deviation from Hardy-Weinberg equilibrium (HWE; P < 1 × 10 − 6 ), there were 17,250,600, 17,267,260 and 15,017,692 autosomal SNPs for the 2039 Charolais (CHA), 1964 Limousin (LIM) and 1502 Holstein-Friesian (HOF) sire analyses. A single-SNP regression analyses was performed for each breed separately using weighted (i.e., by an effective record contribution) sire EBVs for M. bovis infection resistance/ susceptibility and the nominal P-values were used for downstream integrative genomics analyses.
All data-intensive computational procedures were performed on a 36-core/72-thread compute server (2× Intel ® Xeon ® CPU E5-2697 v4 processors, 2.30 GHz with 18 cores each), with 512 GB of RAM, 96 TB SAS storage (12 × 8 TB at 7200 rpm), 480 GB SSD storage, and with Ubuntu Linux OS (version 18.04 LTS). The complete computational and bioinformatics workflow is available with additional information as a public GitHub repository (github.com/ThomasHall1688/ Bovine_multi-omic_integration). The individual components of the experimental and computational workflows are shown in Fig. 1 and described in more detail below.

Differential gene expression analysis of RNA-seq data
A custom Perl script was used to deconvolute barcoded RNA-seq reads into individual libraries, filter out adapter sequence reads, and remove poor quality reads [43]. At each stage of the process, a quality check was performed on the FASTQ files with FastQC (version 0.11.8) [95]. Paired-end sequence reads were then aligned to the Bos taurus reference genome (ARS-UCD1.2, GenBank assembly accession: GCA_002263795.2) [96] using the STAR aligner (version 2.7) [97]. Read counts for each gene were calculated using featureCounts (version 1.6.4) [98], set to unambiguously assign uniquely aligned paired-end reads in a stranded manner to gene exon annotation. Using the R statistical programming language (version 3.5.3) [99], gene annotation was derived from the NCBI database via a GFF annotation file GCF_ 002263805.1 with additional descriptions and chromosomal locations annotated using GO.db (version 3.8.2) [100] and biomaRt packages (version 2.40.0) [101]. Differential gene expression analysis was performed using the DESeq2 package (version 1.24.0) [102] with a longitudinal time series design that accounted for time (hpi) and experimental treatment (M. bovis-infected versus control). Lowly expressed reads were removed using the mean of normalized counts as a filter statistic; individual genes with very low read counts would typically not exhibit significant differential expression due to high dispersion [102]. In addition, extreme count outliers were removed within DESeq2 using the Cook's distance [103] as previously described [102]. Multiple testing correction was performed on each time point using the Benjamini-Hochberg (B-H) false discovery rate (FDR) method [104]. The default criteria for differentially expressed (DE) genes were an FDR-adjusted P-value less than 0.05 (P adj. < 0.05).

Ingenuity ® pathway analysis (IPA) of differentially expressed genes
Ingenuity ® Pathway Analysis-IPA ® (version 1.1, summer 2020 release; Qiagen, Redwood City, CA, USA) was used to perform a statistical enrichment analysis of DE gene sets and expression data [58]. This enabled identification of canonical pathways and functional processes of biological importance in alveolar macrophages challenged with M. bovis across the longitudinal infection time course. The target species selected was Homo sapiens and the cell type used was Macrophage with the Experimentally Observed and High Predicted confidence settings. Following best practice, the default background gene set for pathway and functional process enrichment testing was the set of detectable genes across all RNAseq libraries for each time point contrast and not the complete bovine transcriptome [105].
Identification of functional gene modules using differential co-expression network analysis An integrated computational pipeline for differential co-expression network analysis was implemented using: 1) the Differential Gene Correlation Analysis (DGCA) R package (version 1.0.2) [61]; 2) the Cytoscape open source Java platform for visualisation and integration of biomolecular interaction networks (version 3.7.0) [106]; and 3) the Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) R package (version 1.3.7) [107].
Due to the low variance in gene expression observed among samples at 2 and 6 hpi ( Fig. 2a and b), only data from the 24 and 48 hpi time points were used for the differential correlation analysis. The DGCA R package was used to filter normalised gene counts such that genes in the lower 30th percentile of median expression values were removed. Pearson correlation coefficients were then calculated for each gene pair between the control non-infected and M. bovis-infected samples at 24 and 48 hpi. Following this, for each time point, the infected and control samples were randomly shuffled, and the analysis was repeated for a total of ten iterations. Additionally, using the permutation testing and reference pool distribution approaches implemented in DGCA, an empirical P-value was calculated for each observed correlation coefficient and q-values were calculated based on empirical P-values and the estimated proportion of null hypotheses; gene pairs were then considered to be differentially correlated with a q-value threshold of 0.10 [61].
The correlation networks and network parameters generated by DGCA were initially visualised, examined and evaluated using the Cytoscape platform. In correlation networks, based on gene co-expression, each gene acts as a node and each correlation acts as a weighted edge, depending on the strength of the correlation coefficient [108,109]. The DGCA network data for the 24 and 48 hpi time points were then imported into MEGE NA for identification of functional subnetworks (modules) using differentially correlated (q < 0.10) gene pairs ranked by q-value to a maximum of 3500 gene pairs; this gave 3500 and 1085 gene pairs for the 24 and 48 hpi correlation networks, respectively. Each gene pair was assigned to a class that described the change in correlation depending on infection status, and only those genes that exhibited a change in expression pattern were included in the MEGENA analysis to identify functional modules.
Functional modules in the 24 and 48 hpi correlation networks were detected as locally coherent subnetwork clusters with a minimum of 20 unique genes that MEGE NA classified as statistically significant (P-value < 0.05) based on analyses of shortest path indices, local path index, weight of the correlation and overall modularity. The resulting MEGENA functional modules were visualised using the Cytoscape platform and genes embedded in functional modules at 24 and 48 hpi were combined and annotated for downstream GWAS integration as described below. The genes contained in these functional modules were also subject to gene ontology (GO) term enrichment analyses within the MEGENA package [107].

Detection of active gene subnetworks using a tuberculosis and mycobacterial infection gene interaction network
The GeneCards ® gene compendium and knowledgebase (www.genecards.org; version 4.9), which integrates multiple sources of biological information on all annotated and predicted human genes [78], was used to identify a set of genes that are functionally associated with the host response to TB and other diseases caused by infection with mycobacteria. The GeneCards search query generated a total of 2291 gene hits (Additional file 4) using the search terms: tuberculosis OR mycobacterium OR mycobacteria OR mycobacterial. Genes were ranked by a GeneCards statistic-the Relevance Score-based on the Elasticsearch algorithm [110], which determines the strength of the relationships between genes and keyword terms. Gene IDs were converted to human Ensembl gene IDs [111] and retained for downstream analysis using the InnateDB knowledgebase and analysis platform for systems level analysis of the innate immune response (www.innatedb.com; version 5.4) [62].
A gene interaction network (GIN) was generated with the gene list output from GeneCards using Inna-teDB with default settings and this network was visualised using Cytoscape. The jActivesModules Cytoscape plugin (version 3.12.1) [112] was then used to superimpose the bAM RNA-seq gene expression data and detect, through a greedy search algorithm, differentially active subnetworks (modules) of genes at the 24 and 48 hpi time points. Locally coherent clusters that contain genes that are differentially expressed were identified using the log 2 FC and P adj. values of each differentially expressed gene; the overall connectivity of those genes with their immediate module co-members; and the comparison of that connectivity with a background comprised of randomly drawn networks using the same genes, but independent of the base network. Genes embedded in active modules that were detected as statistically significant at 24 and 48 hpi were combined and annotated for downstream GWAS integration as described below.

Integration of M. bovis-infected bovine AM gene expression data with bTB GWAS data
To facilitate integration of GWAS data with gene sets generated from functional genomics data analyses, an R software package was developed-gwinteR (github.com/ ThomasHall1688/gwinteR), which can be used to test the hypothesis that a specific set of genes is enriched for signal in a GWAS data set relative to the genomic background. This gene set, for example, could be an output from an active gene module network analysis of transcriptomics data from a cell type or tissue relevant to the GWAS phenotype. To formally test the primary hypothesis, the gwinteR tool was designed to determine if genomic regions containing GWAS SNPs that are proximal to genes within a gene set are enriched for statistical associations with the trait/s analysed in the GWAS.
The gwinteR tool works as follows: 1) a set of significant and non-significant SNPs (named the target SNP set) is collated across all genes in a specific gene set at increasing genomic intervals upstream and downstream from each gene inclusive of the coding sequence (e.g., ± 10 kb, ±20 kb, ±30 kb … … ±100 kb); 2) for each genomic region, a null distribution of 1000 SNP sets, each of which contains the same number of total significant and non-significant combined SNPs as the target SNP set, is generated by resampling with replacement from the search space of the total population of SNPs in the GWAS data set; 3) the nominal (uncorrected) GWAS Pvalues for the target SNP set and the null distribution SNP sets are converted to local FDR-adjusted P-values (P adj. ) using the fdrtool R package (current version 1.2.15) [113]; 4) a permuted P-value (P perm. ) to the test the primary hypothesis for each observed genomic interval target SNP set is generated based on the proportion of permuted random SNP sets where the same or a larger number of SNPs exhibiting significant q-values (e.g. q < 0.05 or q < 0.10) are observed; 5) gwinteR generates data to plot P perm. results by genomic interval class and obtain a graphical representation of the GWAS signal surrounding genes within the target gene set; and 6) a summary output file of all SNPs in the observed target SNP set with genomic locations and q-values is generated for subsequent investigation.
In the original bTB GWAS data set used for the present study [55], the WGS-imputed SNPs were mapped to the UMD3.1 bovine genome assembly [114]. Consequently, prior to GWAS data integration, the imputed and previously filtered SNPs for each of the three breed groups were mapped, using a custom R pipeline (github.com/ThomasHall1688/Bovine_multi-omic_ integration), to the most recent ARS-UCD1.2 cattle reference genome assembly [96]. After this step, there were 14,583,567, 14,586,972 and 12,740,315 autosomal SNPs with nominal GWAS P-values that could be used for integrative genomics analyses of the CHA, LIM and HOF breeds, respectively.
For the integrative analyses of bAM functional genomics outputs with the bTB GWAS data, three different subsets of genes were used: 1) basic DE gene sets that were filtered to ensure manageable computational loads using stringent expression threshold criteria of |log 2 FC| > 2 and P adj. < 0.01 and P adj. < 0.000001 for 24 and 48 hpi, respectively; 2) genes embedded in functional modules at 24 and 48 hpi that were detected using the MEGENA package in the differential co-expression network analyses; and 3) genes embedded in active modules at 24 and 48 hpi that were identified using jActiveModules within the tuberculosis and mycobacterial infection interaction network.