Identification of novel alternative splicing associated with mastitis disease in Holstein dairy cows using large gap read mapping

Background Mastitis is a very common disease in the dairy industry that producers encounter daily. Transcriptomics, using RNA-Sequencing (RNA - Seq) technology, can be used to study the functional aspect of mastitis resistance to identify animals that have a better immune response to mastitis. When the cow has mastitis, not only genes but also specific mRNA isoforms generated via alternative splicing (AS) could be differentially expressed (DE), leading to the phenotypic variation observed. Therefore, the objective of this study was to use large gap read mapping to identify mRNA isoforms DE between healthy and mastitic milk somatic cell samples (N = 12). These mRNA isoforms were then categorized based on being 1) annotated mRNA isoforms for gene name and length, 2) annotated mRNA isoforms with different transcript length and 3) novel mRNA isoforms of non - annotated genes. Results Analysis identified 333 DE transcripts (with at least 2 mRNA isoforms annotated, with at least one being DE) between healthy and mastitic samples corresponding to 303 unique genes. Of these 333 DE transcripts between healthy and mastitic samples, 68 mRNA isoforms are annotated in the bovine genome reference (ARS.UCD.1.2), 249 mRNA isoforms had novel transcript lengths of known genes and 16 were novel transcript lengths of non - annotated genes in the bovine genome reference (ARS.UCD.1.2). Functional analysis including gene ontology, gene network and metabolic pathway analysis was performed on the list of 288 annotated and unique DE mRNA isoforms. In total, 67 significant metabolic pathways were identified including positive regulation of cytokine secretion and immune response. Additionally, numerous DE novel mRNA isoforms showed potential involvement with the immune system or mastitis. Lastly, QTL annotation analysis was performed on coding regions of the DE mRNA isoforms, identifying overlapping QTLs associated with clinical mastitis and somatic cell score. Conclusion This study identified novel mRNA isoforms generated via AS that could lead to differences in the immune response of Holstein dairy cows and be potentially implemented in future breeding programs. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08430-x.

information at the host level, however, using a cost effective and minimally invasive technique to obtain this information is critical. Next generation sequencing technologies have advanced this area for high -throughput functional genomics [2,3].
One high -throughput technology, RNA -Sequencing (RNA -Seq), characterizes the transcriptome of the host and has been developed to identify and quantify tissue -specific genes and transcripts that are differentially expressed (DE [4]). RNA -Sequencing can also be used to identify DE alternative splicing (AS) variants and structural variation in the coding region such as SNPs and INDELs [5][6][7][8]. The phenomenon of AS plays an important role in regulating the mammalian proteome and disease processes [9], as AS generates different mature transcripts from the same primary RNA sequence [10,11] from a specific gene [12]. These mRNA isoforms can then be translated into functionally similar proteins that have similar but not identical amino acid sequences that contribute to observed phenotypic variation [12].
Previous research has investigated novel isoforms derived by AS and found that the isoform levels were DE in healthy and infected mammary tissues [11]. Other research has also shown that AS events in the mammary gland of dairy cows are implicated in the host immune response [9]. Therefore, by identifying the specific DE mRNA isoforms produced via AS, the complexity of the transcriptome data may be reduced and may allow for the identification of specific mRNA isoforms that contribute to observed phenotypic variation. These specific mRNA isoforms could then be targeted in future SNP discovery studies, which could be implemented in future breeding programs involving marker -assisted or genomic selection.
Therefore, the goals of this study were to: (1) identify DE annotated mRNA isoforms between milk somatic cells (SC) samples from healthy and mastitic mammary quarters; (2) identify novel transcripts length from annotated genes or novel transcripts associated with non -annotated genes DE between healthy and mastitic samples; (3) identify functional candidate transcripts involved in immune processes that could potentially be associated with differences in the host response to mastitis that may be genomic track regions to target for future SNP discovery studies; and (4) annotate QTLs located in the coding regions of the identified DE mRNA isoforms.

Global transcript expression
A total of 182 million single -end reads were generated from milk SC samples. RNA -Sequencing analysis revealed that 89.87% of these reads were mapped to the annotated bovine reference genome (ARS_UCD1.2; Table 1). The total number of transcripts expressed by milk SC from healthy and mastitic samples was 88,050 and 85,486, respectively (RPKM ≥ 0.2). To look at the functional relevance of these AS, we focused on the genes with two or more mRNA isoforms with at least one mRNA isoform being DE. In total, 333 DE transcripts were identified (FDR < 0.05 and |FC| > 2), each Table 1 Alignment statistics of the 12 milk somatic cells samples collected from 6 Holstein dairy cows a a Samples were aligned to the ARS_UCD1.2 bovine reference genome; b These samples were collected from the same cow (3rd lactation, 74 DIM); c These samples were collected from the same cow (2nd lactation, 44 DIM); d These samples were collected from the same cow (3rd lactation, 178 DIM); e These samples were collected from the same cow (2nd lactation, 133 DIM); f These samples were collected from the same cow (2nd lactation, 7 DIM); g These samples were collected from the same cow (1st lactation, 236 DIM)

Group
Sample ID Total reads mapped Uniquely mapped reads%  Table 2) and 16 were novel transcripts of non -annotated genes (Table 3), which were later annotated using NCBI blast (blastx; https:// blast. ncbi. nlm. nih. gov/ Blast. cgi). Of the 333 DE mRNA isoforms, 248 were under -expressed in the mastitic samples compared to the healthy samples (FC value < − 2); the most under -expressed mRNA isoform in the mastitic group compared to the healthy group was the bovine leukocyte antigen (BOLA) gene (FC = − 28,511.36). As discussed at length by Asselstine et al. [13], the BOLA genes are a complex group of genes and some of them are highly polymorphic. This in turn leads to variation in the animal's ability to recognize antigens and carry out antigen presentation, making some animals more susceptible to disease and infection than others [14]. One of the reasons of BoLA being under-expressed could be the presence of polymorphisms or other functional variations that directly affects its expression, for example any polymorphisms impacting the amino acid and thus the formation of the protein. Alternatively, 85 isoforms were over -expressed in the mastitic samples compared to the healthy samples (|FC| > 2); the most over -expressed mRNA isoform was solute carrier family 36 member 1 (SLC36A1; FC = 3688.84). The solute carrier family in general is involved in the movement of amino acids [15]; and research has shown that amino acid transporters, such as the solute carrier family, affect T -cell fate decision, which has a central role in immune response [16].

Functional analysis of known mRNA isoforms Gene ontology (GO)
To ensure the most complete list of AS mRNA isoforms was used for functional analysis, the two groups with the gene name annotated in the ARS -UCD1.2 bovine reference genome were combined. This left 288 unique mRNA isoforms with gene name annotated. The three main GO categories (biological processes, molecular function, cellular process) were analyzed using the list of 288 mRNA isoforms. There were 16 significantly enriched GO terms were associated with the biological process GO category, with the most mRNA isoforms being involved with cellular process (N = 121) and metabolic process (N = 81; Fig. 1). For the molecular function GO category, the most enriched GO terms were binding (N = 75) and catalytic activity (N = 54; Fig. 2). Lastly, the most enriched GO terms for cellular component category were cell and cell part (both N = 130; Fig. 3). These results are in concordance with those found in Yang et al. [17] and Asselstine et al. [13], which both looked at the GO terms associated with mastitis in dairy cattle.

Gene network analysis
To complete the functional gene network analysis, the list of 288 combined and unique annotated mRNA isoforms was used. These 288 annotated mRNA isoforms were involved in 67 significantly enriched pathways including: regulation of cytokine biosynthetic process, positive regulation of cytokine secretion, cell -cell adhesion and immune response (Table 4). These gene networks are all connected via a few key nodes (cyclic AMP response element -binding protein binding protein (CREBBP), hypoxia inducible factor 1 subunit alpha (HIF -1α), switch/sucrose non-fermentable related, matrix associated, actin dependent regulator of chromatin, subfamily A, member 4 (SMARCA4), ribosomal protein S27a (RPS27A); Fig. 4). Of these 4 key nodes, 3 of them are expressed in our list of DE mRNA isoforms (CREBBP, HIF -1α and SMARCA4).
The first node CREBBP is 240 × under -expressed in the mastitic samples compared to the healthy samples. This binding protein activates specific transcription factors that are involved in cellular activities such as DNA repair, cell growth, differentiation and apoptosis by binding to the cAMP response element -binding protein (CREB [18]). The CREB is part of the innate immune system and has many roles in immune response, such as mediating the NF -κB-dependent antiapoptotic response in macrophages, thus macrophage survival and enhancement of the immune response [19,20]. This is an important function of CREB as certain microbes or pathogens can induce apoptosis of macrophages as a mechanism to evade the host immune response [21].
One key aspect of determining the severity of the mastitis infection is how fast the host can adapt and clear the mastitis causing agents. Another function of CREB is that it induces IL -10 production, which is an inflammatory cytokine that has a key role in mediating the inflammatory loop to prevent unwanted tissue damage [22]. Tissue damage is another issue with mastitis as it reduces the number and activity of epithelial cells, contributing to a decreased milk production [23]. However, in our study, we did not find any DE mRNA isoforms associated with IL -10. In summary, if CREBBP is significantly under -expressed or has a polymorphism preventing it from binding CREB, this could severely impact the host's ability to kill the mastitis causing agents and reduce the inflammation and therefore, a case of mastitis that could have been minor becomes more severe.
The second node that explains the majority of the topology of the network is HIF -1α. Hypoxia occurs when there is an oxygen shortage and has been shown to regulate innate immunological functions including apoptosis, phagocytosis of pathogens, antigen presentation, cytokine production among others [24]. As discussed in Palazon et al. [25], HIF -1α is widely expressed and detected in virtually all innate and adaptive immune cell populations including macrophages [26], neutrophils [27], dendritic cells [28] and lymphocytes [29]. One study by Seagroves et al. [30] found that mice lacking HIF -1α had impaired mammary differentiation and lipid secretion, which caused drastic changes in milk composition. Thus, illustrating that HIF -1α plays a critical role in the function of mammary epithelium. To the best of our knowledge, this gene has not been directly linked with mastitis in ruminants but given its key role in the innate and adaptive immune function, as well as mammary epithelium, this may be a key gene in mammary gland health and function. In our study, this mRNA isoform was 1055 × under -expressed in our mastitic samples, therefore, further research should investigate if specific polymorphisms in this mRNA isoform that could impact its functionality. Lastly, the SMARCA4 gene proteins form one subunit of the switch/sucrose non -fermentable complex, which plays an important role in chromatin remodeling and is a known regulator for transcription and DNA repair [31,32]. The SMARCA4 gene is often mutated or silenced in tumors [33] and mutations in this gene have been associated with numerous human cancers such as small cell carcinoma of the ovary [34] and non -small cell lung cancer [32] among others. It has also demonstrated roles in T -cell development, T -cell lineage choice, T helper (Th) differentiation/function and macrophage function [34]. This is relevant because based on the type of mastitis pathogen in the udder, different T -cells are recruited [35]. However, to the best of our knowledge, no previous research has investigated the functions of SMARCA4 regarding bovine mastitis, but it was 912 × underexpressed in the mastitic samples in this study, which suggests that this gene may be a candidate gene for further mastitis studies. Therefore, further research is needed to determine if this mRNA isoform could impact the host's immune response or if it is just a key player in normal cell function.

Functional analysis of novel mRNA isoforms
As previously discussed, we were not only interested in looking at mRNA isoforms previously annotated, but also novel mRNA isoforms generated by AS. We found that of the 333 DE transcripts, 16 were novel transcript lengths of unknown genes; out of these 6 were under -expressed in the mastitic samples compared to the healthy samples, while 10 were over -expressed. Using NCBI blast, the predicted gene name was identified for these novel transcripts of non -annotated genes in the ARS.UCD1.2 bovine reference genome. As presented in Tables 3, 13 of the 16 novel mRNA isoforms had a predicted gene associated with them. These 13 novel mRNA isoforms have been split into 3 separate categories: immune response, normal cell function and unknown function.

Immune response
The first two novel mRNA isoforms (Gene_1149_1 and Gene_1149_8), are predicted to be Endoglin. Research has shown that the Endoglin gene product is associated with transforming growth factor -β (TGF -β) in humans and when there was a lack of expression of Endoglin  [37] and has biological processes associated with host -virus interaction. In general, mastitis is caused by bacterial and non -bacterial pathogens, but some research has shown that cows with clinical mastitis have other viral infections including infections bovine rhinotracheitis [38,39] and foot -and -mouth disease [40]. Both isoforms were over -expressed in the mastitic samples (FC = 36.05,  Novel mRNA isoform Gene_926_6 is predicted to be the gene Pellino and this gene has been found to be expressed in various studies looking at immune response in humans [41] and mastitis in the goat mammary gland [42]. However, in neither of these studies was it the main focus so limited information could be found on its functionality in relation to mastitis or the immune system. In our study, this mRNA isoform was over -expressed in the mastitic samples (FC = 419.48; FDR = 4.09E-02).
The two novel mRNA isoforms (Gene_2644_2 and Gene_657_5) were predicted to be associated with interleukin -1 (IL -1), which is a pro -inflammatory cytokine and one of the elements of enhancing antigen recognition [43]. Interleukin -1 helps to initiate the inflammatory response that can then be beneficial for initiating response to IMI, but it can also be damaging if its expression is excessive or prolonged. This is important for mastitis as the host needs to be able to recognize the antigen (or mastitis causing agent) quickly and efficiently, without causing more damage to the mammary tissue. In our study, both novel mRNA isoforms were over -expressed in the mastitic samples (FC = 22.49, FDR = 1.17E-04; FC = 792.39, FDR = 1.17E-04, respectively).
The WD repeat -containing protein 34 (WDR34) gene has been implicated in the immune response as a negative regulator of IL -1R/TLR3/TLR4 -induced NF -κB activation pathway [44,45]. This predicted gene is associated with the novel mRNA isoform Gene_1180_2, which was under -expressed (FC = − 3352.94, FDR = 2.95E-03) in the mastitic samples. As this AS variant is associated with a gene that is a regulator of the immune response, if it is not functioning properly, this could potentially impact the host's response to the IMI.
Next, the predicted gene uS9 associated with Gene_343_3 was over -expressed in the mastitic samples compared to the healthy samples (FC = 47.64, FDR = 4.28E-05). One role of uS9 is that it can block natural killer cell activation which play an important role in host defence. To the best of our knowledge, it has not been previously associated with mastitis so further research is needed.

Normal cell function
The novel mRNA isoform (Gene_1007_5) was predicted to be Prolactin, which is key in the maintenance of milk secretion [46]. It is not known if this specific mRNA isoform impacts mastitis but given the critical role of this gene in milk production it would be important to consider this isoform, which is 228x over -expressed in the mastitic samples (FC = 288.43, FDR = 3.24E-02).
Next, the predicted gene FosB was associated with the novel mRNA isoform (Gene_966_1) and was over -expressed in the mastitic samples (FC = 1401.95, FDR = 3.40E-02). The Fos family members are closely related with the Jun family members and both compose the AP -1 transcription factor which participates in the  control of cellular responses to regulate normal cell functions including cell death [47]. The novel mRNA isoform (Gene_726_6) associated with Spastin was under -expressed in the mastitic group (FC = − 152.57, FDR = 2.86E-02). Spastin is involved in microtubule dynamics for ATP -ase and therefore is important for normal cell function [48].
The predicted gene NADPH -cytochrome P450 reductase is critical for normal cell function and in this study was associated with Gene_1335_4, which was under -expressed in the mastitic group (FC = − 83.18, FDR = 3.99E-02). Although this gene was underexpressed, it is possible that the cause of this is due to the increased proportion of inflammatory cells in the mastitic milk SC [13].

Unknown function
Alternatively, three splice variants of the 16 novel transcripts of non -annotated genes did not have a predicted gene name and perhaps are extremely novel due to the inability for a match to be made in any species (Gene_943_4, Gene_1711_9 and Gene_1711_7). Two AS variants were under -expressed (Gene_943_4 and Gene_1711_9; FC = − 230.47, FDR = 6.67E-05 and FC = − 54.49, FDR = 1.07E-04, respectively) while one (Gene_1711_7) was over -expressed (FC = 73.31, FDR = 1.28E-04); due to not having a predicted gene name, no functional information could be identified for them. Thus, further research is required to determine the direct or indirect role they may have on mastitis or the immune system.

QTL annotation
Identifying QTL can make important connections between the phenotypic trait of interest and identify key differences in the host genome. The current cattle QTL database has 159,844 QTL relating to 653 different traits (release 42 [49]; https:// www. anima lgeno me. org/ cgibin/ QTLdb/ index). The QTL annotation was performed using the coordinates of the 333 AS DE mRNA isoforms (Supplementary Table 3). In total, 207 previously annotated QTL were located in the regions of DE mRNA AS variants (Supplementary Table 4). The QTL were annotated for milk (66%), reproduction (13%), exterior (8%), production (6%), health (4%) and meat and carcass (3%; Fig. 5).
As expected, due to the importance of milk production, the majority of QTL were associated with milk traits, with the largest amount of QTL being associated with milk protein percentage. Milk protein is a critical a FDR False discovery rate; b DE Differentially expressed  Table 4) were associated with previously annotated QTL regions. One of these mRNA isoforms, Casein Kappa (CSN3), has been identified in numerous studies. One study by Alim et al. [50] found CSN3 to be an important candidate that influences milk production traits (i.e., milk protein) and could be used for the genetic improvement of milk production traits in dairy cattle. In our analysis, there were three different CSN3 isoforms, CSN3_1, CSN3_2 and CSN3_5 that were 100x, 26x and 51x, under -expressed respectively, in the mastitic samples compared to the healthy samples, so close attention should be paid to these specific mRNA isoforms if implementing them into breeding practices.
When we look at the QTL associated with health traits, there are previously annotated QTL associated with somatic cell score (SCS), clinical mastitis (CM) and bovine respiratory disease (Supplementary Table 4). There are 9 previously annotated QTL corresponding to 5 different genes (solute carrier family 9 member A8 (SLC9A8), lactoferrin (LTF), ribosomal protein S6 kinase C1 (RPS6KC1), Sad1 and UNC84 domain containing 2 (SUN2) and nuclear factor I X (NFIX)). These genes each have only one DE mRNA isoform associated with them. Both SLC9A8_2 and RPS6KC1_5 were over -expressed in the mastitic group (FC = 157.68 and 734.76, respectively). The SLC9A8 gene is important in the protection of epithelial cells from bacterial adhesion [51,52] and this is associated with the QTL trait of CM. The mRNA isoform Fig. 4 Gene network analysis constructed with the 288 unique mRNA isoforms Ensembl IDs that are involved in host immune response using NetworkAnalyst (http:// www. netwo rkana lyst. ca). The grey circles represent mRNA isoforms that are involved in the gene network analysis. The darker the grey and the larger the size of the circle represents its functional relevance, due to it being connected to numerous other mRNA isoforms RPS6KC1_5 was annotated to be associated with SCS however, further research is needed to determine if it is a direct connection as literature is still scarce. Alternatively, 3 mRNA isoforms, LTF_10, SUN_12 and NFIX_8 were all under -expressed (FC = − 1768.47, − 85.03 and − 472.07, respectively) in the mastitic samples compared to the healthy samples. Lactoferrin_10 is associated with the QTL for CM and previous research has shown that LTF is a multifunctional protein with antimicrobial properties and plays an important role in innate immunity participating in the host first line defense [53,54]. Interestingly, the mRNA isoform associated to LFT, was one of the most under -expressed in the mastitic samples compared to the healthy samples, which suggests this mRNA may be an important candidate gene to better understand the mechanisms involved in the development of mastitis. The SUN2_12 and NFIX_8 mRNAs isoforms are both associated with the QTL for SCS, however more research is needed to determine how these mRNA isoforms are related to mastitis before implementing them into breeding practices.

Conclusion
The AS of known mRNA isoforms is significantly enriched in immune pathways such as cytokine secretion and cell -cell adhesion. Numerous novel mRNA isoforms were also identified that are involved with the immune system or mastitis. However, further research is needed to validate predicted genes and determine the exact impact they would have in relation to mastitis resistance. QTL annotation analysis revealed that the loci containing the identified DE mRNA isoforms overlap with QTL associated with CM and SCS, as well as milk traits including milk protein percentage and milk yield. In conclusion, LGRM identified novel mRNA isoforms that could lead to differences in the immune response of Holstein dairy cows. This research could aid in the implementation of breeding practices to aid in breeding healthier animals that are better able to adapt or prevent mastitis infections using either marker -assisted or genomic selection.

Animals and sample collection
This study was approved by the UC Davis Institutional Animal Care and Use Committee (IACUC). Sample collections and procedures were performed in accordance with the approved guidelines of UC Davis IACUC. The transcriptomes of 12 bovine milk somatic cell samples were characterized from 6 Holstein dairy cows using RNA -Sequencing to compare healthy and mastitic quarters within cows. Two different milk samples were collected from each cow, one sample from the mastitic quarter which was found using the California mastitis test and the other sample taken diagonally across from the mastitic quarter and classified as healthy (N = 12 [13]), based on having a SCC < 100,000 cells/mL. The cow's teat was cleaned with gauze and damped in 70% isopropanol, then 50 mL of milk sample was taken from each quarter using a 3 cm plastic cannula (Genesis Industries Inc., Elmwood, WI) to ensure no bacteria contaminated the sample. Milk samples were kept on ice and immediately processed for RNA extraction using a Trizol reagent to preserve the integrity of the RNA.

RNA extraction, library construction and sequencing
Transcriptomic analysis of 12 samples from bovine milk SC was performed using RNA -Sequencing technology as described by Cánovas et al. [55]. RNA sample preparation was also described in Cánovas et al. [55] and RNA quality was evaluated using the RNA integrity number (RIN) value from the Experion automated electrophoresis system (BioRad, Hercules, CA [56]). The RIN values ranged from 8.0 to 9.0 in all milk SC samples, indicating good RNA quality [56]. Library construction was performed using the TrueSeq RNA sample preparation kit (Illumina, San Diego, CA [57]). Sequencing was completed with an Illumina HiSeq 2000 analyzer that yielded 100 base pair (bp) single read sequences [13].

Transcriptome analysis Sequence assembly and quantification
Quality control, including the trimming of reads, was performed by CLC genomics workbench (CLC Bio Version 20.0.4, Aarhus, Denmark) using the quality trimming scores: limit = 0.05; maximum number of ambiguous bases = 2; discard reads below 100 bp. Trimming the reads allowed for single -end sequences to be included in this analysis, which improved the quality of the alignment sequences. After trimming, all samples passed the quality control analysis based on GC content, Phred score and over -represented sequence parameters as described by [55].
The trimmed sequences were aligned to the bovine reference genome (ARS_UCD1.2; ftp:// ftp. ensem bl. org/ pub/ relea se-100/) using CLC genomics workbench, with a Large Gap Read Mapping (LGRM) approach [58,59]. The LGRM tool can map sequence reads that span introns without requiring prior transcript annotation, thus allowing the best match for a given read to be identified. The mapping criteria that followed included mismatch, insertion and deletion costs of 2, 3 and 3, respectively and was performed as described by Cardoso et al. [58].
Transcript discovery was performed to identify transcripts in both healthy and mastitic samples using CLC genomics workbench. We first performed transcript discovery on the healthy group which using the bovine reference genome and the LGRM assembly for the healthy group. Parameters for filtering include gene merging distance = 50, minimum reads in gene = 10 and minimum predicted gene length = > 200 bp [59]. For the mastitic transcript discovery, we used the predicted RNA and gene tracks generated from the healthy group and annotated bovine reference genome. Thus, the predicted mRNA file contained predicted information from both sets of samples (healthy and mastitic), in addition to annotated genome information, which was used as a reference track to map the reads of each sample. Transcript levels were quantified in reads per kilo base per million mapped reads (RPKM [57]). By normalizing the data for RNA length and total reads in each sample, the RPKM measure facilitated comparisons of transcript levels between groups [4]. A threshold of RPKM ≥0.2 was used to select transcripts expressed in each sample, as this threshold has previously been used by Wickramasinghe et al. [5] to detect gene expression in milk SC. We considered mRNA isoforms to be DE between healthy and mastitic samples when it had a false discovery rate (FDR) < 0.05 and a fold -change (|FC|) > 2.

Transcript annotation and functional enrichment analysis
Transcript annotation of the bovine mRNA isoforms was retrieved from the BioMart Database (http:// useast. ensem bl. org/ bioma rt/ martv iew/). Gene ontology (GO) enrichment analysis was performed using Panther software [60]. The GO terms associated with the three main GO categories such as biological processes, molecular function and cellular component were analyzed [61].
Gene network analysis was performed using Network-Analyst (http:// www. netwo rkana lyst. ca) software. Net-workAnalyst software performs meta -analysis on gene expression data sets to determine important features, patterns, functions and connections among genes [62][63][64][65][66]. A list of the Ensembl gene Ids related to DE mRNA isoforms was uploaded and the program's default parameters were used.
For the novel mRNA AS variants, the predicted gene associated with each was identified. This was done using NCBI viewer to download the FASTA files based on the location (start and end position) of the novel transcripts in the bovine genome. These FASTA files were then uploaded to NCBI blast (blastx; https:// blast. ncbi. nlm. nih. gov/ Blast. cgi). The protein data bank protein feature was used as reference database to find similar sequences genome annotation in either bovine or in other species allowing inferences about a genes and functions to be made. From the predicted gene associated with each novel isoform further functional analysis was performed on a gene -by -gene manner.

QTL annotation analysis
Lastly, QTL annotation analysis was performed using the R package: Genomic functional Annotation in Livestock for positional candidate LOci (GALLO) (https:// CRAN.R-proje ct. org/ packa ge= GALLO; [67]). The genome coordinates of the DE transcripts were used, as well as the QTL .gff annotation file retrieved from the cattle QTL Database (https:// www. anima lgeno me. org/ cgi-bin/ QTLdb/ index [49,68]). We used windows of 1000 bp to account for 100 upstream and 100 downstream of each DE transcript [69].