Characterization of the equine skeletal muscle transcriptome identifies novel functional responses to exercise training

Background Digital gene expression profiling was used to characterize the assembly of genes expressed in equine skeletal muscle and to identify the subset of genes that were differentially expressed following a ten-month period of exercise training. The study cohort comprised seven Thoroughbred racehorses from a single training yard. Skeletal muscle biopsies were collected at rest from the gluteus medius at two time points: T1 - untrained, (9 ± 0.5 months old) and T2 - trained (20 ± 0.7 months old). Results The most abundant mRNA transcripts in the muscle transcriptome were those involved in muscle contraction, aerobic respiration and mitochondrial function. A previously unreported over-representation of genes related to RNA processing, the stress response and proteolysis was observed. Following training 92 tags were differentially expressed of which 74 were annotated. Sixteen genes showed increased expression, including the mitochondrial genes ACADVL, MRPS21 and SLC25A29 encoded by the nuclear genome. Among the 58 genes with decreased expression, MSTN, a negative regulator of muscle growth, had the greatest decrease. Functional analysis of all expressed genes using FatiScan revealed an asymmetric distribution of 482 Gene Ontology (GO) groups and 18 KEGG pathways. Functional groups displaying highly significant (P < 0.0001) increased expression included mitochondrion, oxidative phosphorylation and fatty acid metabolism while functional groups with decreased expression were mainly associated with structural genes and included the sarcoplasm, laminin complex and cytoskeleton. Conclusion Exercise training in Thoroughbred racehorses results in coordinate changes in the gene expression of functional groups of genes related to metabolism, oxidative phosphorylation and muscle structure.


Background
The phenotypic and biochemical changes occurring in response to exercise training have been extensively studied in humans and mammals, the results mainly being of a descriptive nature. The adaptive response to training is dependent on variations in exercise-induced changes in muscle load, energy requirements and calcium flux. Endurance training results in increased aerobic capacity [1], mitochondrial biogenesis [2] and a shift from carbo-hydrate to fat metabolism [3] whereas resistance training promotes protein synthesis [4,5], muscle hypertrophy [6] and a switch from slow to fast twitch muscle. Numerous equine studies have also confirmed an increase in VO 2max and an increase in oxidative enzyme activity [7][8][9][10][11][12] following endurance training. An increase in type II and a concurrent decrease in type IIX fibres is observed in Thoroughbreds in response to high intensity training [13,14]. Also, anerobic capacity and speed and strength have been observed to increase following short duration high intensity (~100-150% VO 2max ) exercise [14][15][16].
In contrast, much less is known regarding the transcriptional reprogramming underlying the highly specific adaptive responses to endurance and resistance exercise.
Exercise studies using human subjects and animal models have demonstrated that changes in the expression of a wide range of mRNA transcripts play a major role in the recovery of muscle following exercise with the expression levels of most genes returning to baseline within 24 hours [17][18][19][20][21][22][23]. However, it appears that repeated bouts of exercise lead to new basal levels of gene expression in resting muscle. Higher levels of mitochondrial genes and genes involved in energy metabolism were observed in endurance trained athletes compared to sedentary subjects [24]. Further evidence for a new steady state level of exercise related genes comes from a recent study in which differential levels of gene expression were observed in resting skeletal muscle from sedentary, endurance trained and resistance trained subjects. However the use of intra rather than inter-individual genetic comparisons as well as different training stimuli may have contributed to the observed differences in gene expression between the groups. The majority of differentially expressed genes were common to both trained states [25]. A surprisingly small number of genes were differentially expressed between endurance trained and resistance trained subjects given the very different phenotypic changes and distinct signalling pathways [26,27] associated with each form of exercise. Studies have indicated that concurrent endurance and resistance training results in impaired strength development and aerobic capacity when compared to a training regime with a single exercise mode [28][29][30][31] a phenomenon described as the interference effect. However, conflicting studies have found little or no effect of a combined training regime on strength and aerobic capacity [32][33][34][35]. The aim of this study was to investigate the global transcriptional response in skeletal muscle to a training regime combining endurance and high intensity sprint exercise in Thoroughbred racehorses. We hypothesise that following training differential expression of genes related to both aerobic capacity and muscle hypertrophy will be observed reflecting the dual nature of the training regime.
The Thoroughbred is a novel and valuable model for identifying molecular mechanisms basic to both endurance and resistance adaptive responses. Competitive horse racing dates to 4500 BC and Thoroughbreds have been bred for speed and stamina since the 1700 s. This intense selection has resulted in a highly adapted athlete [36]. Thoroughbreds have a very high aerobic capacity or maximal oxygen uptake (VO 2max ) [37] relative to their body mass. For instance, VO 2max can reach 180-200 mL O 2 /min/kg, approximately 2.5 times higher than other species of similar size [38]. This is achieved through a large lung volume, high cardiac output, high haemoglobin concentration, high muscle mitochondrial volume and a high skeletal muscle mass [38][39][40][41][42][43][44]. During intense exercise such as under racing conditions a Thoroughbred may increase its metabolic rate from basal levels by up to 60-fold [45]. Furthermore, the Thoroughbred has a very high skeletal muscle mass comprising over 55% of total body mass [46].
A Thoroughbred racehorse trained for flat racing undergoes a training regime comprising intermittent days of sprint exercise to promote increased muscle mass among periods of prolonged exercise at a slower pace to enhance aerobic capacity. In a previous study we detected molecular signatures of both endurance and resistance exercise in untrained Thoroughbred skeletal muscle following a single bout of exhaustive exercise [47]. A further advantage of using Thoroughbreds as an exercise model is that inter-individual comparisons can be made between subjects that come from a similar background (genetic and environmental) and have undergone a similar exercise regime with relatively little variation in management. Variations in genetic and environmental conditions cannot be controlled to the same extent in human subjects.
To-date the main approach to investigating global transcriptional changes has been the use of gene expression microarray platforms [47][48][49]. In this study we have used digital gene expression (DGE, Illumina) profiling to characterize the assembly of genes expressed in equine skeletal muscle and to identify the subset of genes that were differentially expressed following a ten month period of exercise training. DGE is a recently developed alternative to microarray gene expression profiling [50][51][52]. The DGE method involves the generation of a cDNA library with a 17 bp tag generated by restriction digestions for each mRNA transcript. The tags are directly sequenced using the Illumina Genome Analyzer creating millions of short reads. In contrast to microarray technology which is limited to the hybridisation of cDNA to probes printed on the array platform, DGE is not dependent on currently available genome sequence and thus provides a global, hypothesis-free quantitative analysis of the transcriptome. The technique is conceptually similar to serial analysis of gene expression (SAGE) [53] but substantially less expensive, more general and capable of delivering more information.
Using this technique we investigated 1) the overrepresentation of functional groups in skeletal muscle relative to the entire genome, 2) the genes differentially expressed in trained relative to untrained skeletal muscle, and 3) the overrepresentation of functional groups in genes differentially expressed following training in skeletal muscle.

Representation of genes by DGE tags
A limitation of genome wide gene expression analysis using DGE is that it is not possible to evaluate the expres-sion of genes that do not contain a NlaIII restriction site and in some cases there is ambiguity regarding the taggene matches as a single tag may match to two or more genes. 91% (n = 22,996) of equine genes with Ensembl gene IDs (n = 25,180) have a NlaIII restriction site but 13% of these are not unique; therefore, 78% (n = 19,271) of currently annotated equine genes are quantifiable using DGE.
As poor quality sequence was obtained for one of the samples just 13 samples were used for analysis. Of the 13 samples successfully sequenced a total of 183 million raw reads were generated. Of these 119 million reads passed the Illumina pipeline quality filters. These 119 million usable reads consisted of 17.6 million unique tags. 66% of the usable reads mapped to the horse genome, 30% of the usable reads mapped to the predicted Ensembl gene restriction sites and 36% to the genomic regions.
The intragenic reads may represent regulatory noncoding RNAs or novel genes. However, more likely is the explanation that these tags are a combination of segments of genes that were excluded from the current annotation (or assembly) of the equine genome; an observation which has been previously reported [54] as well as tags containing sequencing errors. We expect that as the annotation of the horse transcriptome improves that most of the genomic tags we have sequenced will be reassigned to genic tags. In particular we believe that a de novo transcriptome assembly approach (using longer sequencing reads) of the equine muscle transcriptome would enable us to more accurately allocate tags to the correct gene models. In the absence of an accurate muscle transcriptome we believe that the Ensembl horse transcriptome, which is predominantly automatically generated and infers much of the information about gene models by homology from better annotated organisms, represents the best available option for DGE tag mapping.
The reasons that reads may not match a genomic location include ambiguous reads (same sequence tag present in more than one genomic location), reads overlapping an unannotated exon boundary, sequencing errors or single nucleotide variants present in the tag. Due to the short nature of the reads used in DGE compared to other sequencing protocols it is problematic to correct for SNPs or sequencing errors by allowing mismatched bases. Other protocols (e.g. RNAseq), which generate longer tags, can overcome this limitation but they introduce new problems, the most important of which is multiple tags per transcript and a bias towards highly expressed long transcripts [55]. The intragenic reads may represent regulatory non-coding RNAs or novel genes. However, more likely is the explanation that these tags represent segments of genes that were excluded from the current annotation (or assembly) of the equine genome; an observation which has been previously reported [54].
Only the 20% of reads that unambiguously matched Ensembl genes were used for further analysis. These represented 5,068 unique genes, ~25% of annotated equine genes. As some genes were represented by multiple different transcripts these were summed to calculate the total number of reads per gene. Highly expressed genes where > 50,000 tags per million (TPM) were detected made up 39% of all annotated reads. However, the majority of unique genes were expressed at low levels (i.e. 2,200 genes, < 40 TPM) and there was an inverse relationship between the level of gene expression and the number of genes expressed (Figure 1).

Functional annotation of muscle transcriptome
Using the online tool DAVID, 448 gene ontology groups and 14 KEGG pathways were observed to be significantly (FDR = 0.05) overrepresented in skeletal muscle relative to the entire genome. There was a substantial overlap of genes within these functional groups resulting in the overrepresentation of a large number of functionally similar gene ontology groups. Therefore only the highly significant groups are shown in Table 1. The overrepresentation of mitochondrial genes, and genes involved in muscle contraction and metabolism concurs with current SAGE data [56]. However, an overrepresentation of genes related to RNA processing, the stress response and proteolysis has not, to our knowledge, previously been reported in the muscle transcriptome. DGE is much more sensitive to the detection of low level transcripts than SAGE and consequently provides greater coverage of the muscle transcriptome. When functional analysis of only the highly expressed genes (those comprising > 0.05% of annotated muscle transcriptome) was performed, the novel overrepresented functional groups were not identified. This indicates that these are comprised of genes expressed at relatively low levels. Furthermore, functional groups involved in muscle contraction and aerobic respiration were more significantly overrepresented among the highly expressed genes (Table 1).
A list of the most abundant genes (those comprising >0.5% of annotated muscle transcriptome) is presented in Table 2. Just 28 genes contribute to over 50% of the annotated mRNA in equine skeletal muscle and are principally involved in muscle contraction and energy metabolism. Creatine kinase muscle (CKM) was the most abundantly expressed gene in equine skeletal muscle representing 6.9% of the annotated transcriptome and creatine kinase, mitochondrial 2, (CKMT2), was ranked 20 th among the most abundantly expressed genes, making up 0.8% of the transcriptome. Human studies using SAGE have indicated that CKM mRNA makes up ~1% of the human skeletal muscle transcriptome and CKMT2 did not feature in a list of the 54 most abundantly expressed genes [56]. The very high levels of both isoforms of creatine kinase in equine muscle compared to humans is indicative of the highly adapted athletic capacity of Thoroughbred horses as creatine kinases play a crucial role as an energy store in tissues with fluctuating energy demands. CKM is utilised during anaerobic respiration while CKMT2 is tightly coupled to oxidative phosphorylation [57][58][59][60]. The importance of CKM in athletic adaptation in the horse is further supported by the identification of a novel SNP in the CKM gene that, in a preliminary study, has been observed to be associated with racing performance [61]. The third most highly expressed gene in equine skeletal muscle, actin, alpha 1, skeletal muscle (ACTA1) has also been implicated as a candidate athletic performance gene following a genome scan for positive selection in Thoroughbred horses [62].

Differential gene expression following training
Following correction for multiple testing, a total of 92 transcripts were significantly (FDR = 0.05) differentially expressed in the skeletal muscle transcriptome following a ten month period of training: nineteen transcripts showed increased expression (+0.72-fold to +29.3-fold), and 73 displayed decreased expression (-0.43-fold to -4.2fold). Twenty of the differentially expressed transcripts lay within annotated genes, 54 transcripts were located < 5 kb up or downstream of annotated genes and for 18 transcripts no annotated genes were located within 5 kb. The transcripts located in the vicinity of equine genes may represent regulatory regions of the genes and more in-depth analysis and annotation of the recently sequenced equine genome may lead to a reassessment of the boundaries of many currently annotated genes [63]. The uncharacterised transcripts that were not in the region of any known equine gene may represent novel TPM per Gene % Reads equine exercise related genes or non-protein coding regulatory mRNAs. The differentially expressed transcripts, including those located within 5 kb of a known gene, and the associated gene names are listed in Table 3 and Table  4. Genes with higher post-training basal mRNA levels included those involved in the mitochondria, ubiquitination and circadian rhythm regulation, whereas genes with significantly reduced mRNA were mainly associated with cytoskeletal structure and the control of growth and development.

Functional profiling of differentially expressed genes
Significantly up and down-regulated blocks of functionally related genes were identified by performing a gene enrichment test (FatiScan) for all expressed genes ranked according to differential expression following training. Among the up-regulated genes we identified 275 significantly (FDR < 0.05) overrepresented GO terms and 13 KEGG pathways. Among the down-regulated genes we identified 207 significantly (FDR < 0.05) overrepresented GO terms and five KEGG pathways. Subsets of the functional groups are shown in Table 5 and Table 6, and were  chosen to include all significant (FDR<0.05) KEGG pathways and all highly significantly (FDR < 0.0001) overrepresented functional groups satisfying GO > level 6 and for which at least six genes were identified. The most significantly overrepresented cellular compartment GO groups among the genes with increased abundance post-training were mitochondrion (CC GO:0005739; P < 1.04 × 10 -41 ) and related terms such as organelle inner membrane (CC GO:0019866) and mitochondrial part (CC GO:0044429). Aerobic respiration (BP GO:0009060), oxidative phosphorylation (BP GO:0006119) and the tricarboxylic acid cycle (GO BP:0006099) were among the overrepresented GO biological processes groups. The KEGG pathways included Citrate cycle (TCA cycle) (hsa00020) and Oxidative phosphorylation (hsa00190) and multiple metabolism pathways. These transcriptional data concur with biochemical and physiological studies that have demonstrated an increase in mitochondrial volume and aerobic capacity following endurance training [1,2]. Although there is evidence to indicate that an increase in oxidative capacity is part of the maturation process in horses [64] it has been demonstrated that exercise training, not growth, causes increases in whole muscle activity of the oxidative enzyme succinate dehydrogynase and changes in muscle fibre type composition in young Thoroughbred horses [14]. To our knowledge, this is the first time that these GO groups have been shown to have increased expression following exercise training.
This highlights the value of using a method such as FatiScan which incorporates all experimental data rather than limiting interpretation to those that rank among the highly differentially expressed. Only three mitochondrial genes were among those significantly differentially expressed: MRPS21, SLC25A29 and ACADVL. MRPS21 is a nuclear-encoded mitochondrial ribosomal gene required for protein synthesis in the mitochondria. Therefore, an increase in mitochondrial abundance would require an increase in mitochondrial protein synthesis. The SLC25A29 and ACADVL proteins are localized in the mitochondrial inner membrane and play a role in fat metabolism [65,66]. The fatty acid oxidation (BP GO:0019395), fatty acid beta-oxidation (BP GO:0006635) and fatty acid metabolic process (BP GO:0006631) GO ontologies were also overrepresented among genes that increased expression following training. This is in agreement with previous observations of a shift towards fatty acid metabolism in response to exercise training [3]. Furthermore, 12 of the 13 up-regulated KEGG pathways were associated with aerobic respiration and metabolism. Overall these results demonstrate that exercise training brings about a subtle but coordinated increase in the basal level of gene expression of a wide array of genes involved in energy production and metabolism.  Interestingly there was also an up-regulation of GO terms involved in the immune response such as the KEGG pathway complement and coagulation cascades, complement activation and positive regulation of immune response. The up-regulation of the complement and coagulation cascades may be a response to exercise induced hemolysis. It has been suggested that exercise induced decreases in blood pH and increases in blood temperature may increase the osmotic fragility of erythrocytes. Previous studies have shown that an immune response is elicited in response to a single bout of exercise and that this response is attenuated in trained subjects. Furthermore, it appears that moderate exercise can enhance the immune response [67], whereas over-training in humans is detrimental to health and can leave athletes more susceptible to infection [68]. Overtraining in horses has been associated with increased levels of the alpha-1-antitrypsin protein [69] which is involved in protection of cells from inflammatory enzymes released from neutrophils [70]. This protein was also found to be increased in humans following a marathon run but returned to basal levels within a few hours [71]. Despite numerous studies documenting the immune response to a single bout of exercise [72][73][74], little is known regarding the molecular mechanisms governing the adaptations to the immune response brought about by exercise training. It has been suggested that exercise-induced reactive oxygen species (ROS) may play a major role in the modulation of the immune response following exercise [75]. It is also likely that exercise-induced muscle damage contributes to the inflammatory response [76]. The exercise regime undertaken by the horses in this study incorporated both endurance and sprint work which would be expected to elicit both increased ROS and intramuscular microtears.
Another interesting observation was the increased expression of ribosomal genes as elevated rates of protein synthesis and degradation have been reported following resistance exercise with an overall increase in protein mass [4,77,78].
The down-regulated functional groups were mainly associated with structural genes and ion transport. It has been shown that the cellular response to mechanical stimuli, such as increased load, involves ECM signalling to the cytoskeleton at focal adhesion complexes via integrin receptors. Ion transport is central to muscular contraction. Calcium is the main regulatory and signalling molecule in muscle and ATP synthesis is dependent on phosphate transport. Although the down-regulation of these functional groups is counter-intuitive, the modulation of gene expression in these functional groups may reflect structural reorganization of myofibrils.

Validation of a panel of genes by real time qRT-PCR
Eleven genes represented by tags that were differentially expressed between untrained and trained skeletal muscle were selected for real time qRT-PCR validation. An asterisk (*) indicates that the tag lay within 5 kb of the gene. FC -fold change; Adj P -adjusted P value using the Benjamini and Hochberg method.  ) were located within 5 kb of a known gene and were predicted to represent the gene. Primers were designed to span exons 1 and 2 or exons 2 and 3 of the gene of interest. This approach was taken to validate both the differential expression of genes and to assess the prediction that the differentially expressed tags that were identified within 5 kb of a known gene were indeed representative transcripts of that gene. The mean expression of three of the four genes represented by intergenic tags reached significance (P < 0.05) and concurred with DGE data. ACTN3 showed the same direction of change as the DGE data and tended towards significance (P < 0.1). The mean expression of six of the seven genes predicted to be represented by adjacent tags agreed with the DGE data, the exception being TNNT3. The putative TNNT3 tag was matched to a region ~880 bp downstream of the TNNT3 gene and may represent a novel gene or mRNA. Alternatively the tag may span a splice site in an alternative gene and consequently may represent RNA transcribed from a different region in the genome. Real time qRT-PCR results are detailed in Table  7.
The PER2 and PER3 genes, key molecular clock components within the mammalian circadian timing system [79], had mean post-training increases in expression of +1.88-fold and +1.74 fold respectively. The induction of these genes may represent an entrainment of the muscle transcriptional clock by a regular exercise regime. While primarily regulated by photoperiodic signals to the master pacemaker within the suprachiasmatic nucleus, peripheral circadian clocks, which are known to exist in almost all peripheral tissues examined to date [80], can also be entrained by alternative timing cues including exercise [81] and feeding [82]. The role of peripheral clocks is to align specific tissue function to the correct time of day via differential regulation of subsets of clockcontrolled genes.
As exercise is a known synchroniser of circadian rhythms in mice [83], humans [81] and horses [84], and PER2 has previously been shown to oscillate in equine tissues [85], the increased expression of PER genes posttraining in the current study is thought to represent a strengthening of the endogenous circadian clock in equine muscle. Furthermore, human studies have shown time of day variations in exercise performance at the physiological level [86][87][88], and it has been suggested that circadian rhythms may play an important role in sports performance [89]. Combined with our results, this is strong incentive for further investigation of the influence of training times on daily muscle function in the horse, such that optimal athletic performance may be achieved.
The proteins encoded by ACADVL (+1.72-fold, P = 0.014), MRPS21 (+6.03-fold, P = 0.013) and SLC25A29 (+1.22-fold, P = 0.350) function in the mitochondria to increase protein synthesis and fat metabolism. The increase in expression of the gene encoding the mito-  [90][91][92][93]. The proteins encoded by ACADVL and SLC25A29 are involved in fat metabolism and are located in the mitochondrial inner membrane. ACTN3, CALM3 and DAG1 were decreased in expression by -1.41-fold (P = 0.090), -1.81-fold (P = 0.028) and -1.27-fold (P = 0.021) respectively. The ACTN3 protein is localized to the skeletal muscle z-discs and DAG1 forms part of the dystroglycan complex. A null mutation in the ACTN3 gene has been associated with sprint performance in human athletes [94] and DAG1 has been proposed as a candidate gene in some muscular myopathies [95,96]. CALM3 is an isoform of calmodulin, a calciummodulated protein which regulate numerous protein targets. The binding of calcium to calmodulin induces a conformational change which affects its ability to bind target proteins. In this manner calmodulin may be used by other proteins as a calcium sensor and signal transducer. CALM3 may be involved in muscle fibre type transformation in response to muscle excitation [97,98]. CALM3 gene expression was also decreased in equine muscle four hours post exhaustive treadmill exercise [47].
IGFBP5 and MSTN encode growth factors with large observed decreases in expression post training (-3.18fold, P = 0.023 and -4.97-fold P = 0.004 respectively). IGFBP5 is one of family of modulators of insulin like growth factors (IGFs) which interact with IGFs resulting in an increase in half life and alteration of the interaction with receptors. IGF-1 promotes muscle hypertrophy and protein levels are increased in humans following adminis-tration of human growth hormone as an illegal doping agent [99,100]. The exact mode of action of IGFBP5 is poorly understood however it has been shown to associate with the extra cellular matrix and is a regulator of a wide range of physiological processes including cell proliferation and muscle cell differentiation [101][102][103].
Myostatin encoded by the MSTN gene is a negative regulator of muscle growth and an inhibitor of satellite cell proliferation [104]. The expression of MSTN was found to be decreased in humans following resistance training [105,106]. Null mutations in this gene have been found to cause a double muscling phenotype in cattle, dogs, and humans [107][108][109][110][111]. Structural variation in the MSTN gene has also been associated with athletic performance in dogs [110] and horses [112]. The differential expression of this gene is of particular significance as an intronic SNP in equine MSTN has been found to be a strong predictor of optimal racing distance in Thoroughbred racehorses [112].

Conclusion
Deep sequencing of the equine skeletal muscle transcriptome has revealed novel transcripts and functional groups associated with this tissue. Furthermore, following exercise training we have observed an increase in the occurrence of genes involved in metabolism and oxidative phosphorylation, and a decrease in the expression of structural genes. Overrepresented functional groups of genes post-training were associated with both endurance and resistance exercise. This study documents the transcriptome-wide reprogramming of skeletal muscle in Thoroughbred racehorses that brings about the well documented phenotypic adaptations to exercise.

Subjects
All animal procedures were approved by the University College Dublin, Animal Research Ethics Committee, a licence was granted from the Department of Health and Children (Ireland) and owners' consent was obtained for all horses. Seven two-year-old untrained Thoroughbred horses (n = 5 females, n = 2 entire males), raised on the same farm for the previous 2 -3 months and destined for Flat racing with the same trainer comprised the study cohort. The horses had a mean height of 154.9 cm (± 2.8) and a mean pre-training weight of 437.4 kg (± 18.0). All horses undertook a regular exercise regime with the same trainer for 10 months (trained). This consisted of light canter (1,500 m) once a day six times a week on an all-weather gallop and higher intensity exercise ("work") no more than once a week which consisted of warm-up (walk and trot) followed by gallop with velocities reaching maximal intensity for 800-1,000 m.

Muscle biopsy sampling
Percutaneous needle muscle biopsies [113] were obtained from the dorsal compartment of the gluteus medius muscle according to Dingboom and colleagues [114] using a 6 mm diameter, modified Bergstrom biopsy needle (Jørgen KRUUSE, Veterinary Supplies). Biopsies were taken approximately 15 cm caudodorsal (one-third of the distance) to the tuber coxae on an imaginary line drawn from the tuber coxae to the head of the tail. The biopsies were obtained at a depth of 80 mm. Each biopsy site was shaved, scrubbed with an antiseptic and desensitized by a local anaesthetic. The biopsy samples were washed with sterile PBS (BD Biosciences, San Jose, CA) and preserved in RNAlater (Ambion, UK) for 24 hours at 4°C and then stored at -20°C. Muscle biopsy samples were collected at rest at two time points: T 0 -untrained and T 2 -trained.

RNA isolation and purification
Approximately 100 mg of each muscle biopsy sample was removed from RNAlater and homogenized in 1 ml TRIzol using a TissueLyser (Qiagen Ltd, Crawley, UK) and extracted according to the manufacturer's instructions. Each sample was purified using the RNeasy ® Mini kit (Qiagen Ltd, Crawley, UK) and DNase treated with RNase free DNase (Qiagen Ltd, Crawley, UK). RNA was quantified using a NanoDrop ® ND1000 spectrophotometer V 3.5.2 (NanoDrop Technologies, Wilmington, DE) and RNA quality was subsequently assessed using the 18S/28 S ratio and RNA integrity number (RIN) on an Agilent Bioanalyser with the RNA 6000 Nano LabChip kit (Agilent Technologies Ireland Ltd, Dublin, Ireland) according to the manufacturers' instructions.

Library preparation for Illumina sequencing
The Illumina cDNA library was prepared according to the manufacturer's instructions. All reagents were supplied by Illumina apart from SuperScript II Reverse Transcriptase (part # 18064-014) with 100 mM DTT. Briefly, 1.5 μg mRNA was isolated from total RNA by binding the mRNA to a magnetic oligo(dT) bead. Double stranded cDNA was synthesized and cleaved at each NlaIII site. The site of NlaIII cleavage was ligated with an Illuminasupplied adaptor using T4 DNA ligase. The bead bound double stranded cDNA was the cut by the restriction enzyme, MmeI. This resulted in a 17 bp tag which was no longer attached to the oligo(dT) bead. The cDNA construct was then precipitated and the site of MmeI cleavage was ligated with an Illumina-supplied adaptor using T4 DNA ligase. The adaptor ligated cDNA was PCR amplified with two adapter primers (Illumina). The PCR product of 85 bp was purified by gel extraction in preparation for loading on the Illumina Cluster Station. The quality and quantity of the purified constructs were assessed using an Agilent DNA series 7500 series II assay (Agilent Technologies Ireland Ltd, Dublin, Ireland) and Qubit fluorometer according to manufacturer's instructions. Cluster generation and sequencing analysis were carried out using Illumina's Solexa Sequencer according to the manufacturer's instructions.

Analysis
The DGE samples were processed through the standard software pipeline provided by Illumina for the Genome Analyzer. The sequence reads were base called using the Bustard base caller (part of the Illumina software). The tag annotation pipeline consisted of two parts: mapping to known Ensembl [115] cDNAs and mapping to the genome. The known cDNAs from version 49 of Ensembl for the EquCab2 assembly of the equine genome were downloaded in FASTA format using the Ensembl biomart tool. The FASTA files for the individual equine chromosomes were downloaded from the UCSC genome browser website [116]. A pipeline consisting of perl, C++ and linux shell scripts was used to conduct an in-silico digestion of both the transcriptome and genome and to generate tag location records which were loaded into a MySQL database. The tag records were then annotated according to their type (genomic or cDNA, canonical, noncanonical, repeat etc.). A matrix of tag counts for each sample was generated. The edgeR Bioconductor package [117] was used to determine differential expression of tags in each group.

Functional clustering according to gene ontology annotations
The equine Ensembl gene IDs were cross-matched to human Ensembl gene IDs. Using the Ensembl IDs of human homologues of equine genes it was possible to use the Database for Annotation, Visualization and Integrated Discovery (DAVID) [118,119] for functional clustering and overrepresentation analyses. The Expression Analysis Systematic Explorer (EASE) tool [120] within DAVID was used to investigate the representation of functional groups in equine skeletal muscle relative to the whole genome. The FatiScan [121,122] gene enrichment test was used to analyse the transcriptional profile posttraining. FatiScan is part of the Babelomics Suite of web tools and tests for the asymmetrical distribution of biological labels in an ordered list of genes through application of a Fisher's exact test. Genes were ranked by differential expression and FatiScan was used to detect functional blocks (GO and KEGG pathways) that were significantly up-regulated and down-regulated posttraining. Results from both EASE and FatiScan were corrected for multiple testing using the Benjamini and Hochberg method [123].

Real time quantitative RT-PCR
Selected cDNA samples were quantified by real time quantitative RT-PCR (qRT-PCR). 1 μg of total RNA from each sample was reverse transcribed into cDNA with oligo-dT primers using a SuperScript™ III first strand synthesis SuperMix kit according to the manufacturer's instructions (Invitrogen Ltd, Paisley, UK). The converted cDNA was diluted to 2.5 ng/μl working stocks and stored at -20°C for subsequent analyses. Oligonucleotide primers for real time qRT-PCR were designed using Primer3 version 3.0 http://www.primer3.sourceforge.net and commercially synthesized (MWG Biotech, Germany). Primer details are shown in Table 8. Each reaction was carried out in a total volume of 20 μl with 5 μl of cDNA (1 ng/μl), 10 μl SYBR ® Green PCR Master Mix (Applied Biosystems, Cambridgeshire, UK) and 5 μl primer/H 2 O. Real time qRT-PCR was performed using a 7500 Fast Real-Time PCR machine (Applied Biosystems, Cambridgeshire, UK). All reactions were performed in duplicate. Hypoxanthine phosphoribosyltransferase 1 (HPRT) was selected as a stable reference gene based on a study of equine reference genes for real time qRT-PCR [124] and on the DGE results. Expression values were calculated using a standard curve which was plotted based on the expression of HPRT in serial dilutions of equine skeletal muscle RNA (1:1, 1:2, 1:4, 1:8, 1:16, 1:32, and 1:64). The standard curve method was used to normalise the gene expression data. The paired Student's t-test was used to identify significant differences in mRNA abundance between time-points.