We integrated a mouse muscle Bayesian Network and transcriptome data from the muscle of two inbred strains, LG/J and SM/J, with the results of QTL mapping of muscle weights in an advanced intercross  to nominate genes contributing to a 2-fold difference in muscle mass. The analyses based on three independent sources of information converged on a set of eight genes (Kdr, Plbd1, Mgp, Fah, Prss23, 2310014F06Rik, Grtp1, Stk10) as the most likely QTGs residing within five QTL regions. An additional phenotypic analysis confirmed the predictive power of the gene network analysis.
The present study identified 13,726 genes expressed in mouse skeletal muscle, which approximately doubles the number reported earlier in a microarray based study . An expansion of the muscle transcriptome was expected based on the recent comparison between the microarray and RNA-Seq methods in brain tissue , and illustrates the superior sensitivity of RNA-Seq. This set of data, therefore, provides a benchmark of expression levels of different genes within mouse muscle tissue, something that was not possible to obtain reliably with microarrays because of variation in sensitivity of hybridization among the probes .
The procedure of mapping of sequenced transcriptome fragments on to the reference sequence allows a defined number of mismatches . This provision is particularly important for identification of polymorphisms. However, a side effect of it might be a background noise resulting from the mapping of some of the fragments (likely those originating from homologues) to the genes which in fact are not expressed in the tissue. It is possible to assess the level of such noise empirically by examining expression of the Y chromosome genes in females. The highest level of expression of these genes found in females indicates the threshold below which a reliable assessment of transcript abundance is not possible. Application of this threshold reduced the estimated number of expressed genes by >30%.
A plethora of mouse models divergently selected for various phenotypes have been generated over the decades. These models captured increasing and decreasing alleles of relevant genes and provide a rich resource for studying of the mechanisms underlying variation in specific trait. However, as the genomic sequence of many of these strains is not available yet, utilization of these resources has been hampered. We demonstrated here that the high throughput transcriptome analysis by RNA-Seq provides an effective way for utilizing the potential of selected strains.
Validation of network analysis prediction
The expression network analysis and integration of the information from the independent datasets provides additional leverage for prioritization of the candidate genes for further interrogation. However, it is important to assess the reliability of prediction. The analyses of gene expression data in rodent muscles indicated that expression pattern of a number of genes in the TA muscle of the SM/J strain is indicative of a more oxidative phenotype compared that of the LG/J strain. Initially this appeared to conflict with our results obtained in the soleus muscle of these strains . The SM/J strain exhibited lower proportion of oxidative type 1 fibers compared to the LG/J strain (e.g., in females 41% vs 58%, respectively). The examination of the TA muscle however confirmed the predicted prevalence of the oxidative phenotype in the SM/J muscle supporting the predictive power of the network analysis (Figure 3). The reversal of the oxidative profile between the soleus and TA muscles could be explained by the distinct composition of muscle fibers constituting these two muscles in mice; soleus is mainly composed of the fibers expressing type 1 and type 2a myosin heavy chains, coded by Myh7 and Myh2 genes, respectively. The TA muscle, on the other hand contains very few Myh7 expressing fibers but is dominated instead by the fibers expressing type 2a, 2x (Myh1) and 2b (Myh4) myosin heavy chains . From this set of data it appears that the proportion of type 1 vs type 2a fibers in the soleus is determined by different mechanisms than the proportion of type 2a, 2x and 2b fibers in TA muscle.
Candidate genes in the QTL regions
It has been suggested that variation in gene expression is important contributor to the genetic architecture of complex traits . Integration of the gene expression profiling by microarrays and QTL screening in classical mapping populations (backcross or F2) has led to identification of QTGs underlying allergic asthma  and bone mineral density , and to nomination of the candidate genes underlying adipose tissue  just to mention some of the successful attempts to identify QTGs. An improvement in the mapping resolution afforded by an advanced intercross population and enhanced sensitivity of the transcriptome analysis by RNA-Seq provides further incentives for application of this strategy.
Integration of RNA-Seq data with results of QTL mapping from an advanced intercross reduced the number of positional candidates from 1099 genes residing throughout QTL regions to 49 candidate genes differentially expressed or with the coding polymorphisms (with likely functional consequences) between the two strains. These genes were spread across thirteen QTL. Three of those loci, Skmw25, Skmw26 and Skmw34, harboured only one candidate gene (Table 1). The candidacy of Htra2 gene (Skmw26) was supported by the mnd2 model demonstrating a severe muscle wasting phenotype and abnormality of motor neurons resulting from the mutation in the gene [41, 42]. The serine peptidase coded by Htra2 gene is located in the mitochondrial intermembrane space. It activates proapoptotic proteins upon release into the cytosol from damaged mitochondria . Interestingly, in addition to the T449D polymorphism, the transcript abundance of the gene tended to be higher in the SM/J strain (p=0.13; Additional file 1). There is no information on the possible effects of the two genes that are located in single-gene loci (Osbpl3 in Skmw25 and 2310014F06Rik in Skmw34). The latter gene is not translated into a protein (http://www.ensembl.org). However, it possesses the properties of the long intergenic non-coding RNA, lincRNA , which have been implicated in such biological processes as imprinting  and trans-gene regulation .
The remaining ten loci (ranging in size from 1.2 Mb to 5.0 Mb) harboured 2 or 3 candidates. From these results it appears that either the trait is truly polygenic, with more than one gene contributing to a QTL even when the latter had been refined to few Mb, or some of these genes are not causative. Validation studies will be required to address this question.
The ability of RNA-Seq to capture splice variants resulted in an interesting candidate gene for Skmw29 locus. Irak2 codes for Interleukin-1 receptor-associated kinase 2 which is involved in immune response and is important for activation of NFĸB pathway. Four splice variants of the gene with antagonistic effects were identified in mouse; overexpression of Irak2a and Irak2b potentiated activation of NFĸB pathway, whereas Irak2c and Irak2d variants inhibited it . The overexpression of Irak2 in LG/J strain muscles compared to SM/J strain (Additional file 1) was primarily due to Irak2c as the levels of three other variants were similar (Figure 1B). It has been demonstrated that persistent activation of NFĸB pathway caused muscle wasting . Thus, there is a mechanistic link between the levels of expression in Irak2c and muscle mass which identifies the gene as a strong candidate to explain the effect of Skmw29 locus. Overexpression of Irak2c might have contributed to larger muscle mass in LG/J strain by inhibiting NFĸB activation.
The Kdr gene in the Skmw24 locus, encodes for one of the vascular endothelial growth factor receptors and is involved in the development of skeletal muscle tissue . Furthermore, it has been shown that an acute response of the skeletal muscle to resistance exercise involves upregulation of its expression . Resistance exercise is a well known muscle mass increasing stimulus, thus it is plausible that the L117F polymorphism in an evolutionarily conserved region might be contributing to the muscle weight difference between the LG/J and SM/J strains.
The gene coding for matrix GLA protein (Mgp, Skmw31) was shown to be a suppressor of tissue calcification . Mutation of the MGP gene in humans causes Keutel syndrome . A higher level of expression of this gene in skeletal muscle was associated with intramuscular fat infiltration known as marbling in cattle .
Several of the identified genes are involved in cell signalling (e.g. the genes coding for the regulatory inhibitor subunit 16B of protein phosphatase 1, Ppp1r16b, (Skmw22), and serine/threonine-protein kinase 10, Stk10, (Skmw42)), respond to the growth stimulus (growth hormone regulated TBC protein 1, Grtp1) or are involved in regulation of transcription (the Tfdp1 gene encodes for a transcription factor involved in regulation of the cell cycle ). Thus, in addition to being differentially expressed or polymorphic these genes also represent the functional candidates which potentially can modify the abundance of muscle tissue.
In addition to the genes discussed above, the Alpk3 gene in the LG/J strain carries an insertion in exon 5 compared to the SM/J allele. The insertion, CTT, results in additional amino acid, leucine (following the 212 position of the reference sequence), distally of the I-set domain. Functional differences between the two Alpk3 variants have not been reported.
QTL lacking candidate genes
Our approach did not suggest any robust candidates for 4 earlier identified QTL. Interestingly, some of those loci had a substantial effect size on muscle mass (i.e. Skmw23, Skmw30, Skmw38 and Skmw41). Collectively these observations imply that the underlying cause of these loci rest beyond the gene expression patterns in muscle tissue or polymorphisms within the genes. For instance, systemic factors such as hormones can affect muscle tissue. It is also conceivable that the causative genes were expressed during development which might have influenced the number of muscle fibers. In either of those instances no footprint of the influence in muscle transcriptome would be detected.
Only ~4% of differentially expressed genes reside within QTL regions. This observation raises a question about the role of the remaining majority in relation to muscle mass. It can be speculated that secondary changes in gene expression pattern are triggered in the network associated with the QTL causing genes, and genes encoding transcriptional regulators are particularly good candidates. It is also plausible that the systemic factors influencing muscle size are contributing to differential expression between strains. Finally, some of these genes might be involved in other phenotypes and processes which are contrasting between the strains but which are not reflected in muscle weight (e.g. variation in proportion of different fiber types). Integration of the expression data in various tissues at different developmental stages, under different environmental conditions, and profiling of the systemic hormones and growth factors could help understanding of some relationships in gene expression patterns.