Our understanding of muscle physiology has evolved through the years by extensive studies aimed at identifying molecular and physiological mechanisms involved in normal muscle function and disease. The emergence of microarrays in the early 1990 s paved the way for the expansion of this area of research. This technology reliably quantifies the expression levels of the transcripts, providing a snapshot of the activity of several tens of thousands of mRNAs simultaneously [1–3]. Gene expression analysis enables to identify biomarkers [4, 5] and gene signatures [6, 7] in human and animal models.
Gene expression studies in the field of muscle research have generally been carried out using a rather limited set of conditions and replicates. Therefore, experimental designs tend to focus on a few specific research questions [see e.g. ]. Microarrays have allowed the exploration of many fields on a genomic scale. For instance, the molecular diversity of muscle fiber types, the physiological plasticity and adaptation of muscle, as well as muscle atrophy, muscle disease and muscle pharmacogenomics [9–11].
In consequence, microarray data has accumulated rapidly. The transcriptome data can be found in dedicated [e.g. Public Expression Profiling Resource PEPR ] or generic databases [e.g. Gene Expression Omnibus GEO ]. Collecting the different microarray data sets for meta-analysis adds a new dimension to gene expression data analysis by combining a large set of experimental conditions . The quality of any meta-analysis depends on the quality of the underlying data . While considerable divergence across different microarray platforms has been observed in the past [16, 17], their current accuracy and reproducibility [18, 19] now enable reliable comparisons to be made today. Since the landmark study by Rhodes et al. , several recent meta-analysis studies [21–24] have led to important results particularly in the field of cancer research [25–28]. For a given pathology or tissue, meta-analysis yields robust lists of differentially expressed genes (or DEGs). In such a case, each set of data can be considered as an independent validation step [29–31] enhancing the signal-to-noise ratio [20–24]. In addition, new pathways - that could not have been previously identified in isolated data sets - can emerge from a meta-analysis [32, 33]. Finally, when applied to different pathologies, meta-analyses bring to light interesting differences or similarities .
Performing such comparisons across different organisms appears to be a particularly promising approach [35–37] to better understanding of human diseases. Although differences exist , a careful meta-analysis between species can also reveal similarities [39–41]. The animal model can thus replicate some aspects of the human disease , yielding important insights into the pathogenic mechanisms . Recently, Calura et al.  identified a common molecular pathway of atrophy in muscle of multiple species under diverse physiological conditions. This work demonstrates that such comparisons are possible and can be very useful in the field of muscle research. This was generalized by Jelier et al.  who systematically compared 102 muscle-related microarray data sets, based on lists of up- and down-regulated DEGs.
There is a substantial potential for novel discoveries by comparing (and associating) microarray studies. Doing so requires, however, a concerted effort to identify and remove obstacles from the routine mass comparison of microarray data. The objective is to make this amount of data accessible and comparable for the broad scientific community in the field of muscle research. Such databasing allows for a systematic comparison of the results from different studies in order to identify consistent expression patterns . Notably, experimental researchers can interpret new data by exploring these biologically significant patterns. Based on this concept, several web tools have already been developed. They can be divided into two main groups: the first group aims to compare lists of DEGs, whereas the second analyses gene co-expression across data sets.
In the first group, two databases have emerged to host and quickly integrate the results of microarray experiments: LOLA (List Of Lists Annotated)  and L2L (List to List) . LOLA and L2L both gather lists of published DEGs. They allow investigators to compare their own data to lists of DEGs from different platforms and species in order to identify underlying patterns. However, they are quite limited by the size of the database and the reliance upon the way the lists were created (e.g. heterogeneous processing of the studies). To solve this problem, other tools, based on the re-analysis of data sets, have been developed with varying degrees of success [see  for review]. A major problem was the low amount of meaningful raw data deposited in public databases . A more advanced comparison strategy of significant gene lists was provided by Oncomine  and GeneChaser (GENE CHAnge browSER) . Oncomine is a comprehensive and expertly annotated database of gene expression studies. The collection comprises 25,447 samples in 360 experiments taken from 40 cancer types. This tool facilitates the identification of DEGs between cancer and normal tissues or among different cancer subtypes across a large collection of microarray data. This system was successful in performing comparative meta-profiling to identify shared gene expression signatures. However, this feature does not appear to be accessible to the user. Likewise, GeneChaser  automatically re-annotated and analyzed 1,515 GEO data sets from 231 microarray types across 42 species. It performed 12,658 group-versus-group comparisons to identify biological and clinical conditions in which a set of genes is differentially expressed. This tool also provides statistical and graphical representations to interpret these data. Two variant strategies have also been developed, both using signed rank genes as the basis for DEG 'signatures' from a two-group comparison. The first one is a microarray database search algorithm in an application called the Connectivity Map (CMAP) . It gathers a reliable but small number (564) of drug-related cancer signatures in ten cell lines and derived from one laboratory using a single microarray platform. However, signatures derived from other platforms were not demonstrated to work with CMAP. The second strategy called EXALT (EXpression signature AnaLysis Tool)  holds thousands of DEGs (16,181) extracted from a large formatted collection of microarray results from GEO and published cancer studies. This collection represents hundreds of different experiments on many different tissues and generated on multiple platforms. The statistical approach used by the authors is similar to that proposed by Rhodes et al. . It performs statistical tests and then calculates a p-value for each probe, separately for each study, resulting in a list of statistically de-regulated genes for each data set. However, these DEG-based methods have clear caveats. They often use a single significance test to extract DEGs from all experimental designs, and significant genes are defined based on a two-group comparison strategy. Although they adhere strictly to the group design specified by the investigators, DEGs cannot always be extracted from microarray data sets. Some GEO  data sets do not have sufficient information to provide statically reliable results. Additionally, no signature can be produced if a comparison between two groups is not statistically significant. Finally, additional novel comparisons within a data set are not possible: the current GEO data structure does not provide a computable attribute to automatically identify this type of experiment or hypothesis. To this end, other comparison methods, based on co-expression analysis of genes, have been considered. It has been shown that a sufficiently large and diverse set of profiles obtained under various physiological conditions results in the identification of co-regulated transcript groups . Gene co-expression is conserved across microarray data sets  and can be identified in a compendium of gene expression data . This strategy yields the detection of modules of co-expressed genes which are either specific to one physiological condition or shared across a set of different physiological conditions . This approach of cross-platform analysis of microarray data has allowed the unraveling of networks of transcription factors in yeast . This work examined the expression patterns of co-expressed gene pairs or 'doublets' across multiple data sets to infer functional linkages. The search for doublets was used in the GAN (Gene Aging Nexus) tool to explore co-expressed gene pairs across 42 data sets related to age  and was also recently implemented in OncoMine . Based on the results obtained by Lee et al. , the Gemma database and software system was likewise developed for the re-use and meta-analysis of gene expression.
We have taken into account the advantages of the two strategies previously described for microarray data analysis. On those grounds, we developed a tool that makes muscle transcriptome data meta-analysis easily accessible to any user. Specifically, we have built a database that gathers all the public microarray data related to muscle studies from GEO . After a careful re-analysis of microarray data, clusters of co-expressed genes were identified in each data set. Converted into lists of genes, our tool allows the simultaneous comparison of all clusters independently of the platform used and the species studied. This comparison enables to identify:
i) robust signatures of a pathology or a treatment across several independent studies.
ii) sets of genes that may be similarly modulated in different disease states or following drug treatments.
iii) common sets of co-expressed genes between human and animal models.
In the remaining sections of this paper, we first present the MADMuscle tool. We show how the user can browse the microarray data related to muscle studies, examine the annotated clusters and compare his own gene list with the gene lists relative to all the clusters of the database. In the next section, we have developed two meta-analyses to demonstrate the usefulness of our tool.