Meta-analysis of heterogeneous Down Syndrome data reveals consistent genome-wide dosage effects related to neurological processes

Background Down syndrome (DS; trisomy 21) is the most common genetic cause of mental retardation in the human population and key molecular networks dysregulated in DS are still unknown. Many different experimental techniques have been applied to analyse the effects of dosage imbalance at the molecular and phenotypical level, however, currently no integrative approach exists that attempts to extract the common information. Results We have performed a statistical meta-analysis from 45 heterogeneous publicly available DS data sets in order to identify consistent dosage effects from these studies. We identified 324 genes with significant genome-wide dosage effects, including well investigated genes like SOD1, APP, RUNX1 and DYRK1A as well as a large proportion of novel genes (N = 62). Furthermore, we characterized these genes using gene ontology, molecular interactions and promoter sequence analysis. In order to judge relevance of the 324 genes for more general cerebral pathologies we used independent publicly available microarry data from brain studies not related with DS and identified a subset of 79 genes with potential impact for neurocognitive processes. All results have been made available through a web server under http://ds-geneminer.molgen.mpg.de/. Conclusions Our study represents a comprehensive integrative analysis of heterogeneous data including genome-wide transcript levels in the domain of trisomy 21. The detected dosage effects build a resource for further studies of DS pathology and the development of new therapies.


Background
Down syndrome (DS) is the most frequent genomic aneuploidy with an incidence of approximately 1 in 700 live-newborn [1] resulting from the presence of an extra copy of human chromosome 21 (HSA21). DS is characterized by a complex phenotype with features that are not fully penetrant. The most frequent manifestations, which are virtually always present, include mental retardation, morphological abnormalities of the head and limbs, short stature, hypotonia and hyperlaxity of ligaments. Other features occur with less frequency such as organ malformations, particularly of the heart (50% of DS newborns), several types of gastrointestinal tract obstructions or dysfunctions (4-5% of DS newborns), increased risk of leukaemia (20 × higher compared to the normal population), and early occurrence of an Alzheimer-like neuropathology [2,3]. DS has been investigated with multiple functional genomics studies aiming to understand the molecular basis underlying the various aspects of the disease [4][5][6][7].
The most commonly accepted pathogenetic hypothesis is that the dosage imbalance of genes on HSA21 is responsible for the molecular dysfunctions in DS, meaning that genes on the triplicated chromosome are overexpressed due to an extra chromosome 21, as demonstrated for selected genes like SOD1 and DYRK1A [8]. Recent global transcriptome studies with microarrays, however, have generated a more complex picture in the sense that not all HSA21 genes have an elevated expression level as expected [9,10]. An alternative hypothesis is that the phenotype is due to an unstable environment resulting from the dosage imbalance of the hundreds of genes on HSA21 which determines a non-specific disturbance of genomic regulation and expression. The significantly higher interindividual variability in DS, as compared to euploid, individuals supports this hypothesis [11]. Moreover, the two hypotheses could be coexistent [3]. In both hypotheses it is understood that besides alterations of gene expression of HSA21 genes there are numerous genome-wide effects that lead to the dysregulation of many non-HSA21 genes through molecular pathways and interactions.
Many studies on the transcriptome and proteome levels have been conducted to understand the causal relationship between genes at dosage imbalance and DS phenotypes [12]. Gene expression profiles have been analysed from DS fetal [13] and adult human tissues [6]. Additionally, two classes of mouse models [14] have been developed for investigating the molecular genetics of DS, either mouse models with partial trisomies of the syntenic regions of HSA21 in mouse chromosomes 10, 16 or 17, such as Ts16 [15], Ts65Dn [16] and Ts1Cje mice [17], or transgenic mice for specific genes such as SOD1 [18]. Studies of gene expression profiles in human DS samples and mouse models have shown high genome-wide variability [11,[19][20][21][22]. Furthermore, differences due to the applied experimental platforms, specific tissues, developmental stages or the triplicated segments under study introduce a high variation to the assessment of genome-wide effects of DS. Here, integrative and comparative studies are pivotal for the analysis of the complex nature of gene expression and regulation in DS at a more general level [2,23].
Meta-analysis was proven to be a valid strategy to extract consistent information from heterogeneous data, in particular with respect to complex phenotypes for example cancer [24], Alzheimer [25] and type-2 diabetes mellitus [26]. The purpose of meta-analysis is to compensate experiment-specific variations and to reveal consistent information across a wide range of experiments.
To date, such a meta-analysis of DS data is missing.
In this paper we describe a comprehensive meta-analysis from 45 different DS studies on human and mouse on the transcriptome and proteome level including quantitative data such as Affymetrix microarrays, RT-PCR and MALDI studies as well as qualitative data such as SAGE and Western blot analyses. We applied an established computational framework [26] and identified 324 genes with consistent dosage effects in many of these studies. As expected, we observed a high fraction of HSA21 genes (N = 77) but also a large amount of non-HSA21 genes (N = 247). Besides well investigated genes in the context of DS we detected a significant proportion of novel ones (N = 62). The 324 genes were further investigated using functional information, molecular interactions and promoter analysis revealing overrepresented motifs of four transcription factors: RUNX1, E2F1, STAF/PAX2 and STAT3. In order to test the relevance of the 324 genes for more general brain phenotypes we used independent publicly available data on cerebral pathologies not related to DS and identified a subset of 79 DS genes that were differentially expressed in these studies. The detected dosage effects can be used as a resource for further studies of DS pathology, functional experiments and the development of therapies. All data have been agglomerated and made available through a web server that tracks results of the meta-analysis http://ds-geneminer.molgen.mpg.de/ and that enables the community to validate any gene of interest in the light of the experimental data.

Genome-Wide Dosage Effects
Genome-wide dosage effects were computed with the numerical scoring method described in Material and Methods. In total, 45 case-control experiments were interrogated (Additional file 1, Table S1), the alteration for each gene between the trisomic and normal states was scored in each experiment, gene scores were summarised across all experiments and the significance of the summarised scores was judged with a Bootstrap approach. This procedure resulted in a cut-off score value of 3.67 and identified 324 genes as being predominantly affected by DS. The thirty genes with the highest dosage effects, either on HSA21 or on other chromosomes, are listed in Table 1. The entire gene list is given in Additional file 1, Table S2. The meta-analysis identified genes that showed consistent changes in many of the different experiments rather than genes that were affected by a single (or few) experiment(s) ( Figure 1A). This is an important fact since, for example, different mouse models have different coverage of triplicated HSA21 genes, and, thus, might introduce model-specific bias [14]. The consistency of the dosage effect was measured for each gene with an entropy criterion (see Materials and Methods) and Figure 1A reveals a strong preference for the selection of highentropy genes. Highest scores were assigned to HSA21 genes ( Figure 1B) what indicates that the meta-analysis scores reflect the effect of an extra chromosome 21 on gene expression (Table 1). While proportionally most dosage effects were identified for HSA21 genes (77 out of 324), the majority of genes (247 out of 324) was located on other chromosomes highlighting the genome-wide impact of DS ( Figure 1C).
Genome-wide dosage effects underlined the severe phenotypic consequences of DS caused by genes with a major role in human development (Additional file 2, Figure S1). Of the 247 non-HSA21 genes, 72 were associated with development, in particular with respect to organ development (62 genes, GO:0048513), tissue  [5] demonstrated that the region capable of affecting REST levels, in both mouse and human cells, could be assigned to the DYRK1A locus on HSA21 which was found among the top-scoring HSA21 genes (Table 1). TXNIP (thioredoxin interacting protein) had the highest dosage effect (8.79) of all non-HSA21 genes. It has weak association with DS yet (through S100B [27]) but could play a major role for several DS phenotypes. It is a key signalling molecule involved in glucose homeostasis [28], cardiovascular homeostasis [29] and leukaemia [30].
Enrichment of genomic location with respect to the 324 genes was observed in regions of HSA21 and the respective syntenic regions on mouse chromosomes 16, 17 and 10 (Additional file 3, Figure S2). Moreover, in the human genome, additional enrichment on chr3q24 was computed containing the genes GYG1 (glicogenin), PLOD2 (involved in bone morphogenesis), PLSCR4 and CHST2 (involved in inflammatory response in vascular endothelial cells).

Dosage Effects on HSA21
Proportionally HSA21 contributed mostly to the detected dosage effects ( Figure 1C). On the other hand, it is remarkable that only a third of all HSA21 genes (77 out of 255 studied here using the Ensembl genome annotation [31]) showed consistent effects across the different experiments (see also Discussion). While 57 genes had a positive score below the significance threshold of 3.67 indicating relevance with respect to specific experiments only, 121 genes had a score near zero indicating that dosage effects were either compensated or not detected with the selected experimental data ( Figure  1B). HSA21 dosage effects included, for example APP (beta-amyloid precursor protein) involved in senile plaque formation in DS and Alzheimer's disease [3], SOD1 (superoxide dismutase 1), a key enzyme in the metabolism of oxygen-derived free radicals [3], DYRK1A (dualspecificity tyrosine-(Y)-phosphorylation regulated kinase 1A) involved in neuroblast proliferation, crucial for brain function, learning and memory [32], RUNX1 (runt-related transcription factor 1) which plays a critical role in normal hematopoiesis [33], or GABPA (GA binding protein transcription factor, alpha subunit 60 kDa) encoding a DNA binding domain with a huge variety of targets including genes from different cell/tissue specificities and functions [34]. HSA21 genes were mostly up-regulated in gene expression studies (69 out of 77) with the exception of eight genes that were either variable or down-regulated (SLC5A3, MRPS6, B3GALT6, CBS, KCNJ6, KCNJ15, CLDN14, COL18A1). Possible explanations for this observation might be tissue-specificity of gene expression as in the case of MRPS6 which was mostly up-regulated in brain samples and downregulated in other tissues like heart or kidney, or differences in human and mouse gene expression as in the case of CBS which was up-regulated in human but down-regulated in mouse experiments what might be caused by differential tissue specificity of the orthologous mouse gene [35].
Three genomic regions on HSA21 were enriched with the significant genes using the MSigDB_c1 positional database: chr21q22, chr21q21 and chr21q11, located on the q-terminal arm ( Figure 1D). This contradicts the   hypothesis that a single region on HSA21 could be responsible for the molecular and phenotypic consequences of DS with only a few responsive genes [36,37]. Rather our findings support studies that identified more than one HSA21 region causative for DS phenotypes so that the dosage effects were not uniformly distributed along the chromosome but rather enriched in certain regions on HSA21 similar to the results in [38,39].

Functional Annotation Using Gene Enrichment Analysis
Functional annotation of biological pathways was retrieved from the ConsensusPathDB [40], a meta-database that summarizes the content of 22 human interaction databases. A total of 1,695 pre-defined pathways were screened with the 324 genes using gene set enrichment analysis [41]. A total of 277 pathways were found significantly enriched (family-wise error rate (FWER) <0.01) of which several pathways were associated with neurological and neuropathological processes ( Table 2). These pathways referred mainly to (i) neurodegeneration (e.g. Huntington's disease, Alzheimer's disease or Parkinson's disease) and (ii) defects in synapsis (e.g. Axon guidance, NGF signaling). Furthermore, the results emphasized the role of tyrosine-kinase receptors in DS pathology (for example P75(NTR)-mediating signalling or NGF signalling via TRKA) which interact directly with BDNF (brain-derived neurotrophic factor). Moreover, our results showed gene dosage effects caused either directly by genes located on HSA21 (e.g. SOD1, APP, DONSON, TIAM1, COL6A2, ITSN1 and BACE2) or indirectly by HSA21 interactors, highlighting the intrinsic complexity of the DS pathology. For example, PIK3R1 de-regulation impacts on many of these pathways and is a direct interactor of IFNAR1, a significant DS gene. A similar effect can be observed for TPJ1A that has interactions with HSA21 genes JAM2 and CDLN8 both showing consistent dosage effects (cf. Figure 2A).

Dosage Effects on Transcriptional Regulation
Dysregulation of transcriptional regulation is widely reported in DS [34].  We further analyzed the promoter sequences of the 324 genes for enrichment of transcription factor binding sites using the AMADEUS software [43]. Significant enrichment was computed for 4 TF motifs, E2F1, RUNX1, STAF/PAX2 and STAT3 (Table 3). Enrichment was evident for RUNX1, which is among the most studied genes implicated in DS. The implication of E2F1 in DS was also previously reported [34] and could be responsible for impaired cell proliferation documented for hippocampus, cerebellum and astrocytes of DS mouse models.

Dosage Effects and Molecular Interactions
Molecular interactions among the 324 significant genes on HSA21 and on other chromosomes exhibited a complex network supporting the important role of physical interactions as transmitter of dosage effects (Figure 2A). The consequences of HSA21 triplication on the interacting genes was fairly stable as Figure 2B demonstrates. For example, DNAJB1 (DnaJ (Hsp40) homolog, subfamily B, member 1) and PPP3CA (protein phosphatase 3, catalytic subunit, alpha isozyme, data not shown), both interacting with SOD1, were consistently and significantly down-regulated in the human microarray experiments as the fold-changes and P-values indicate. Opposite trends were observed for TJP1 and RHOQ.

Assessing General Relevance of DS Dosage Effects for Neurological Processes
We were further interested in identifying, among the 324 genes, those which were relevant for other brain disorders. To achieve this, we interrogated 19 independent data sets derived from publicly available microarray data (Additional file 1, Table S4). These studies followed heterogeneous research questions on different cerebral pathologies and identified a total of 623 differentially expressed genes. Gene set enrichment analyses [41] with the 324 genes and the corresponding lists of differentially expressed genes were significant for 10 of these 19 studies with 79 overlapping genes ( Figure 3A). Furthermore, we used the HSA21 database http://chr21.molgen.mpg. de/hsa21 [4], a resource of RNA in situ hybridizations in postnatal mouse brain sections, in order to provide independent supporting evidence of brain expression of these 79 genes as shown for example for BACH1 (basic leucine zipper transcription factor 1) and TTC3 (tetratricopeptide repeat domain 3) (Figure 3B and 3C).
Additionally, we investigated the expression patterns of the 79 genes across the DS microarray experiments used for this meta-analysis and could identify brainrelated signatures, for example, a clear up-regulation in brain tissues for the cluster containing C14orf147, IVSN-S1ABP, B2M, TPJ1, SPARC, CTGF, COL4A1 and FSTL1 ( Figure 3D).

Novel Dosage Effects
To identify DS-relevant "novel" dosage effects we excluded from the 324 genes (i) HSA21 genes, (ii) genes that interacted with HSA21 genes, as well as (iii) genes that were associated with DS in the literature (Table 4). Remaining candidates (N = 62) comprised BDNF-related genes (SST), MAPK-pathway genes (KRAS, IGF1R, GNG11 and RAP1A), genes related with leukemia (SFRP1) and Rho-Proteins (DHCR7 and RAB21). SST was found as coexpressed in previous studies with TAC1 [44] which is also significant in our meta-analysis and both showed a strong correlation across DS studies ( Figure 4A).

Discussion
The statistical meta-analysis approach was described previously by Rasche et al. [26]. The score computed with meta-analysis correlates with entropy ( Figure 1A) indicating the ability to identify general dosage effects across many experiments that might be of more phenotypic relevance than very specific ones. Additional file 4, Figures S3A and B provide an overview of the different  sources of data, including two organisms (human and mouse), different tissues (brain, heart and others), different stages of development (adult, postnatal, embryonic) and different mouse models (Ts65DN, Ts1Cje, Tc1). It is per se interesting that, in spite of such heterogeneity, common dosage effects could be identified at all and it should be highlighted that whole-genome data was fairly robust across experiments. Additional file 4, Figure S3D shows the overall correlation of the quantitative values of PCR and microarray values averaged from all experiments with only few genes in the non-concordant sectors of the graph (red points). The score used in this analysis allows detecting genes that could be either up-or down-regulated in different studies. An overview of the fold-changes for the genes across the different experiments is given in Additional file 1, Table S6. Because genes might change their expression level depending on the developmental state, tissue or because of other variables, we expected that this flexibility allows checking the hypothesis of random disturbances as well as the hypothesis of increased expression of HSA21 genes. We detected a clear enrichment of up-regulated genes on the q-terminal part of HSA21 ( Figure 1D and Additional file 3, Figure S2). However, not a single region was identified but rather several smaller regions on HSA21 that agglomerate a large amount of significant dosage effects. This finding was also elaborated before (Korbel et al. [38] and Lyle R et al. [39]) using two independent data sets to characterize the molecular HSA21 regions in a set of DS-patients with partial duplications.
We studied 255 HSA21 genes matched with the probe sets from the microarrays. Of these only 77 showed consistent dosage effects (Figure 1). While 165 HSA21 genes had score values different from zero indicating response in some of the microarray studies, 90 HSA21 genes were not responsive at all and provide evidence for a strong mechanism of dosage compensation. On the other hand, these figures could also reflect the limitation of detecting reliable fold-changes of low magnitude with microarray technology. Furthermore, experiments covered only a limited amount of tissues so that it is likely that some genes were missed simply because they were not responsive in the tissues under analysis. However, having brain as the dominant wholegenome sample source this should ensure expression of most of the genes. Microarray data was focused on the Affymetrix platform in order to reduce variance arising from platform inconsistencies. We have also compared our results with additional studies including own previous research [9] and others [55] and found relevance of selected dosage effects with respect to other tissues as well (data not shown). Additional cross-validation was performed with an independent microarray data set [10]. These authors compared human lymphoblastoid cell lines derived from DS patients and normal controls with a custom-made HSA21 array. Yahya-Graison et al. [10] divided the expression ratios in four classes: class I and class II genes were significantly up-regulated, while class III and class IV genes were either compensated or showed variable response. Our meta-analysis revealed a high-degree of concordance taking into account that the cell model, platform and the methodology used were completely different. The meta-analysis scores were significantly higher for class I and II genes than for class III and IV genes (P-value <0.01, Additional file 5, Figure  S4). 25 out of 39 class I-II genes revealed a significant score in our meta-analysis (75%).
In this study we monitored molecular interactions of HSA21 genes that might function as drivers of dosage effects (Figure 2A). For example, (i) TJP1 (Tight junction protein ZO-1) interacts with two HSA21 genes, JAM2 and CLDN8, (ii) FOS (FBJ murine osteosarcoma viral oncogene homolog) interacts with HSA21 genes ETS2, SUMO3, RUNX1 and indirectly with ERG, (iii) RHOQ (ras homolog gene family, member Q) interacts directly with ITSN1 and TIAM1 and indirectly with SYNJ, and (iv) PIK3R1 interacts directly with IFNAR1 and indirectly with IFNAR2. It should be emphasized that current information on molecular interactions is far from complete, thus we either might miss important interactions not yet detected and/or we might count false positive interactions due to the high error rates of current annotations of interactions. Several of the DS genes (N = 79) extrapolated to more general neurological phenotypes ( Figure 3A). The dendrogram ( Figure 3D) shows further interesting profiles of these genes in the DS samples under analysis: (i) differential gene expression in the cerebellum region versus whole "brain" or cerebrum areas which has been reported in other studies (e.g. Moldrich et al. [56]), (ii) different patterns of gene expression associated to particular developmental stages (P0, P15 and P30); these changes were reported before by Dauphinot el al. [57], and (iii) differences in ES studies.
We further analyzed human and mouse studies separately and found 182 significant dosage effects using only human and 107 dosage effects using only mouse data. The Venn diagram in Additional file 4, Figure S3C clearly shows the benefit in detecting additional dosage effects when mixing the two species. Overlapping dosage effects were detected for 29 genes with both analyses (Additional file 1, Table S7). Results for the human and mouse specific analyses can be found in Additional file 1, Tables S8 and S9. It should be noted here that comparisons between human and mouse using microarrays are inherently difficult and have limitations since the probes for the orthologous mouse and human genes do not correspond well. Furthermore, gene expression variation is generally higher in human individuals compared to mouse inbred strains. Nonetheless, the 107 genes found in the analysis of mouse data (derived from the different mouse models for trisomy 21) represent a core set of genes responsive across different DS mouse models and, thus, could be highly relevant for DS pathogenesis.
In addition to genes commonly related to DS, we have identified novel genes that can be associated with DS phenotypes, in particular with neural development and neurodegeneration. To our best knowledge, this study is the first meta-analysis of genome-wide transcript levels along with other data domains in DS research. The agglomerated data can be accessed through the WEB server at http://ds-geneminer.molgen.mpg.de and the identified dosage effects are a resource for further functional testing and therapeutic development. The deregulated profile of these genes correlates was shown here with the fold-change view of the web browser. B) HSPA5 is a novel gene for DS implicated in neurodegeneration which is also a target of the ATF6 TF whose target set was enriched with significant genes. The histogram displays the p-values for this gene in individual studies. C) KANK1, a gene previously related with paternally inherited cerebral palsy, shows a consistent trend of up-regulation in the considered studies as shown with the fold-change view of the web browser.

Conclusions
We have identified a set of 324 genes with consistent dosage effects from 45 different experiments related to DS. Since the meta-analysis was enriched with brain experiments, we were able to detect a high fraction of genes related to neuro-development, synapsis and neuro-degeneration. Moreover, our results give more information about known and new pathways related to DS and also about 62 novel candidates. The results of the meta-analysis as well as the source data have been made accessible for the community through a WEB interface.

Selection and integration of DS resources
Data sets were selected from heterogeneous technical platforms, different model systems (human cell lines, human tissues, mouse models) and different developmental stages (Additional file 1, Table S1). For each gene and for each source we computed a numerical value that measures its dosage effect. Data categories were either qualitative or quantitative. Qualitative data incorporated a total of 30 published manuscripts including reviews and semi-quantitative studies as well as two SAGE studies [21,58] and were summarised to one score point in order to avoid over-scoring. Here, a "1" referred to the case that the gene was found as DS relevant in one (or more) studies. Quantitative data from differential gene expression studies such as Affymetrix microarrays, RT-PCR, MALDI and other quantitatives techniques were evaluated in order to extract comparable information across the different studies. We considered Affymetrix studies that provided the raw data (CEL file level). Raw data were extracted from Gene Omnibus Expression (GEO, [59]), Array Express [60] or were retrieved from the author's web pages (in total 16 data sets including human tissues and four different mouse models (Ts65Dn, Ts1Cje, Tc1 and Ts + HSA21). Furthermore, we incorporated 18 RT-PCR and MALDI data sets for which the authors displayed the information for all genes under study (either significant or not).

Mapping of gene IDs
A central pre-requisiste of any meta-analysis approach is the consolidation of the different ID types, for example coming from different organisms and from different versions of chips. We used the Ensembl database (version 56) as the backbone annotation for all studies. IDs were mapped to human Ensembl gene IDs. Mapping and merging of the information was done within R and the BioConductor package. In total, information on 19,388 ENSEMBL genes was mapped.

Mapping SAGE IDs
Differential expressed tags were extracted from additional files of the studies. Identifiers (based on sequences) were cross-tagged with the information displayed in the updating tables (SAGEmap_Hs and SAGE-map_Mn) from the SAGE site ftp://ftp.ncbi.nlm.nih.gov/ pub/sage/mappings.

Transcriptome data pre-processing and normalization
We incorporated only case-control studies in the metaanalysis in order to derive expression fold-changes. Affymetrix gene chip annotations were adapted from the latest genome annotation (version 12). Affymetrix data were normalized with GC RMA. For transcriptome case-control studies three pieces of information were stored for each gene; (i) the fold-change (DS vs. controls), (ii) the standard error of the fold-change from the replicated experiments in that study and (iii) the expression p-value (presence-call) that indicates whether or not the gene is expressed in the target samples under study. For RT-PCR and MALDI experiments we computed the fold-change of the mean expression (DS vs. controls) as well as the reported standard error of the ratio. When mean and standard variation for each group (DS and controls) was provided we calculated the ratios as well as their associated standard errors.

Scoring DS dosage effects across studies
In order to score the different categories of information such as binary counts and quantitative gene expression values, we summarized the scores of the individual experiments for each category. For microarray studies, the score of the i-th gene in the j-th study, s ij , was computed as described in Rasche et al. [26]: Here r ij is the fold-change, p ij is the average detection pvalue and e ij is the standard error of the ratio derived from the experimental replicates of the study. Thus, the foldchange is weighted with its reproducibility across the experimental replicates and with the likelihood of the gene being expressed in the study's case or control samples.
For RT-PCR and MALDI studies we applied the following equation: Here r ij is the fold-change and e ij is the standard error of the ratio.
The total score of the gene was computed as the sum across all individual study scores.

Sampling for significance
In order to assess the significance of the overall gene scores we generated random scores by re-sampling the scores 50,000 times with replacement within the same study. Using the random distribution as background we assigned as significant those genes that were above the 99.9 percentile of the background distribution.

Judging consistency of dosage effects
For each gene, entropy of the score distribution was computed in order to quantify the relevance of the gene across many experiments. Let s ij be the score of the ith gene in the jth study, then E i is a measure for the uniformity of the score distribution over the individual experiments: High entropy is assigned to a gene if many experiments contribute to the overall score whereas low entropy is assigned if only a few experiments contribute to the overall score.

Enrichment analysis
Gene Set Enrichment Analysis (GSEA, [41]) of the 324 genes was performed with respect to pre-defined human pathways agglomerated from 22 pathway resources from the ConsensusPathDB ( [40], http://cpdb.molgen.mpg.de. Over-representation analysis of TF target sets was performed with Fisher's test based on annotation from TRANSFAC [42]. Motif enrichment analyses were performed using AMADEUS [43] with significant genes as target sets and all the genes considered in the meta-analysis as background set.

Selection of independent brain experiments
In order to proof general brain relevance of the 324 genes, we collected DS-independent gene expression studies to decipher brain features, performed with Affymetrix technology and, with experiments deposited in GEO or ArrayExpress (Additional file 1, Table S4). Mostly, these experiments were performed in mouse tissues. For each study we collected one or more resulting gene lists that were evaluated using Gene Set Enrichment Analysis (GSEA, [41]) against the complete list of 19,388 genes ranked by score.
further statistical analysis. RH conceived of the study, and participated in its design and coordination. MV, RH, HL and LAPJ contributed to the data interpretation and wrote the manuscript. All authors read and approved the final manuscript.