Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle

Background Gene regulation by transcription factors (TF) is species, tissue and time specific. To better understand how the genetic code controls gene expression in bovine muscle we associated gene expression data from developing Longissimus thoracis et lumborum skeletal muscle with bovine promoter sequence information. Results We created a highly conserved genome-wide promoter landscape comprising 87,408 interactions relating 333 TFs with their 9,242 predicted target genes (TGs). We discovered that the complete set of predicted TGs share an average of 2.75 predicted TF binding sites (TFBSs) and that the average co-expression between a TF and its predicted TGs is higher than the average co-expression between the same TF and all genes. Conversely, pairs of TFs sharing predicted TGs showed a co-expression correlation higher that pairs of TFs not sharing TGs. Finally, we exploited the co-occurrence of predicted TFBS in the context of muscle-derived functionally-coherent modules including cell cycle, mitochondria, immune system, fat metabolism, muscle/glycolysis, and ribosome. Our findings enabled us to reverse engineer a regulatory network of core processes, and correctly identified the involvement of E2F1, GATA2 and NFKB1 in the regulation of cell cycle, fat, and muscle/glycolysis, respectively. Conclusion The pivotal implication of our research is two-fold: (1) there exists a robust genome-wide expression signal between TFs and their predicted TGs in cattle muscle consistent with the extent of promoter sharing; and (2) this signal can be exploited to recover the cellular mechanisms underpinning transcription regulation of muscle structure and development in bovine. Our study represents the first genome-wide report linking tissue specific co-expression to co-regulation in a non-model vertebrate.


Background
The development of a complex eukaryote originates with just a single cell. A few years later it culminates in a functional organism possessing~100 trillion cells delineated into~200 cell types co-ordinately arranged in time and space [1]. A single genome, which is largely static, orchestrates this remarkable event, which is highly dynamic, -but it is still not known how. More specifically, what genomic information turns the right genes on at the right time and in the right place? This problem can be framed in terms of the gene regulation through the action of transcription factors (TFs) on their target genes (TGs) -a well-documented mechanism by which the protein encoded by a TF gene accesses the cell nucleus and binds to TF binding sites (TFBS), located in the promoter region of a TG activating or inhibiting its transcription. Importantly, such regulation acts in a species-, tissue-and time-specific manner [2][3][4].
Based on such framework, many authors have exploited the guilt-by-association heuristic by which genes regulated by the same TF are more likely to show co-expression correlation and, conversely, sets of genes showing an extreme co-expression correlation are more likely to be regulated by the same set of TFs. Inspired by such heuristic, a rational approach for exploiting this co-expression phenomena and deciphering transcriptional regulation activity involves the reverse-engineering of gene regulatory networks using network inference algorithms such as (but not limited to) Bayesian networks [5], CLR [6]; ARACNE [7], and PCIT [8,9]. Ergün et al. [10] exploited the connectivity structure of a gene network to an expression data set and identified genetic drivers of prostate cancer using the so-called MNI algorithm [11].
Other authors [12][13][14] have undertaken a promoter sequence analysis of a correlated group of genes to identify sequence motifs corresponding to TFBS. Using the prediction of binding affinities of a TF to promoters, Roider et al. [15] proposed a method for detecting TFs associated with functional categories.
An equally commendable strategy relies on assigning regulators to modules based on the co-expression between a candidate regulator and each of the members of the module. Examples of the latter approach include the LeMoNe (Learning Module Networks) algorithm of Joshi et al. [16] which generates a number of possible models explaining regulation activity and with every single model containing many regulators. An alternative method, initially introduced by Reverter et al. [17] and more recently implemented in Hudson et al. [18], is based on ranking TF by their absolute co-expression correlation averaged across all genes in a given module.
More recently, our group [19] developed a metric, namely RIF for regulatory impact factors, which ranked TFs by analysing the extreme score to those TF that are consistently most differentially co-expressed with the highly abundant and highly differentially expressed genes (RIF1), and to those TF with the most altered ability to predict the abundance of differentially expressed genes (RIF2).
In spite of this plethora of work aimed at identifying the key TFs responsible for a phenotypic contrast of interest (e.g. healthy versus disease), experimental validation linking co-expression with co-regulation remains scarce and, by and large, available only for model organisms. For instance, based on 2,284 yeast genes and 106 TFs, Allocco et al. [20] investigated and quantified the link between co-expression and co-regulation in yeast and concluded that the correlation co-expression must be greater than 0.84 in order for two genes to have a greater than 50% chance of sharing a common TFBS. Also working with 180 yeast TFs, Yu et al. [21] observed that genes targeted by the same TF tend to be co-expressed, with the degree of co-expression increasing if genes share more than one TF. The authors also reported that targets of a given TF tend to have similar cellular functions. Similarly, the relationship between gene co-expression and co-regulation has been explored in Drosophila melanogaster [22] and mouse [23]. However, given that only three-fifths of the transcriptional networks are broadly conserved, and the associations between TFs and their TGs are flexible [24], there is a need to explore these matters in the actual organism, and indeed preferably in the tissue of interest, rather than making system-wide predictions based on the closest model organism.
Furthermore, based on the observed relationship between TFs and their TGs, some authors have studied gene regulatory networks for model organisms and focussed on the hierarchical structure of the resulting networks [25,26]. Only more recently, Hu and Gallo [27] provide a catalogue of TF pairs, defined as those with TFBS in the same promoter regions, for human genes and in a tissue-specific manner. However, their work is based on only 214 TF and the authors did not investigate the relationship between TF pairs and coexpression of pairs of genes sharing TF pairs. In the light of these shortcomings, it is apparent that more research is needed to understand the relationship between co-expression and co-regulation.
We have elected to focus our efforts on the development of skeletal muscle and the Longissimus thoracis et lumborum (LTL) in particular. Skeletal muscle makes up to 50% of the mass of most mammals and is the single largest tissue contributor to basal metabolism [28]. By merging transcriptional and regulatory information, we aimed to shed light on a range of fundamental hypotheses including: 1) That a relationship exists between co-expression and co-regulation; 2) That the way in which TFs partner with each other scales with the number of common TGs and, in turn, influences the amount of co-expression observed between the TFs; and 3) That TFBS over-represented in the promoter regions of functionally coherent gene modules allows for the identification of TFs and TF networks that are consistent with the biological process of the module.
We first introduce a 'Promoterome' matrix (P-matrix) relating TGs with predicted TFs. In combination with the expression data, this matrix is used to test our first two hypotheses (linking co-expression with co-regulation). We then generate a series of simulated datasets to identify under which conditions, in terms of activation and inhibition, the observed relationship between co-expression and co-regulation can emerge. Finally, we reverse-engineer a network of TFs and discuss its biological relevance in the context of functionally coherent modules.

Results
The P-Matrix -A bovine muscle promoterome matrix linking TFs with predicted TGs We explored the 42,702,661 co-expression correlations among the 9,242 genes of the P-matrix (Additional file 1). Of these, 63% and 37% were positive and negative, respectively. We [18] and others [29][30][31] have reported a higher reliability for positive correlations, in terms of the ability to replicate them in other datasets, or the results corresponding to validated interactions [32]. Figure 1A shows the histogram of the number of TFBS in the promoter region of each of 9,242 genes. 95% of genes have between 3 and 21 TFBS. Figure 1B illustrates the frequency of the number of TFBSs in common for all 42,702,661 gene pairs. 73% of gene pairs have two or more predicted TFBSs in common. Also, 5% of gene pairs share at least 6 TFBSs suggesting these to be components of highly co-regulated networks. Conversely, less than 10% of the gene pairs do not share any TFBSs, suggesting that the vast majority of genes are suitable for co-expression analysis. Figure 1C provides a visual diagnosis of the scale-free behavior of the distribution of TFs as a function of the number of TGs. The number of TGs ranged from 1 (for 48 TFs) to 7,288 (for MZF1; myeloid zinc finger 1) and had a median of 14. Figure 1D reveals an exponential saturation relationship between the number of regulator partners and the number of TGs a given TF has.
Linking of co-expression, co-regulation and common targets Figures 2 provide a snapshot of the results relating co-expression with shared TFBSs. We observed an increase in the average co-expression correlation as the number of common TFs increased (Figure 2A). This pattern was consistent irrespective of whether the absolute and positive correlations were considered separately. However, this relationship was more apparent when only positive correlations were considered. For pairs of genes predicted to be jointly targeted by 10 or more TFs, we found a 4% and a 6% increase in absolute and positive correlations respectively, compared with random pairs of genes ( Figure 2A).
We also observed ( Figure 2B) a strong monotonic increase in the strength of the co-expression between a TF pair and the number of TGs they share. To the best of our knowledge, no study exists exploring this relationship. Among the 15,753 pairs existing from the 178 TFs with expression data, 4,880 where among TF pairs sharing at least one TG. For these, the number of TGs in common averaged 37 and ranged from 1 (for 1,439 TF pairs) to 5,857 (between MZF1 and TGIF1).
One final objective in exploring the relationship between co-expression and co-regulation was to assess  Percentage increase (black circles trend) and cumulative frequency (red triangles trend) of co-expression correlations between pairs of TFs as a function of the number of common targets and for absolute correlations; (C) Number of common TFBS as a function of the co-expression correlation for a pair of genes. Observed (black) and naive expectation (red); (D) Target co-expression: Ratio of the average absolute correlation between a TF and its TG over the average absolute correlation between a TF and all genes in the dataset, and as a function of the number of TGs (x-axis). Values on y-axis above one signify the expression of a TF is more correlated with its TGs than with all the genes. Highlighted in red are the 94 TFs for which the average absolute correlation with their TGs is above average. whether the average co-expression correlation between a TF and its TGs is higher than the average co-expression between the same TF and all genes in the dataset ( Figure 2D). Out of the 178 TFs with expression data, 108 (or 61%) showed a ratio >1, corresponding to a cumulative binomial P-value of 0.001675. When this relationship was limited to the 94 TFs for which their absolute correlation co-expression with their TGs was above average (the average being 0.4334), we found that 71 of them (or 76%) showed a ratio >1 (P-value = 1.13E-07). This distinction is of relevance because, under the null hypothesis of no relationship, TFs showing extreme correlation with their TGs should also show extreme correlation with all genes in the dataset.
On the superior reliability of positive co-expression correlations Table 1 provides further evidence that positive correlations are more reliable than negative ones. However, when considering the correlation between a TF and its TGs, and as noted by Yu et al. [33], the sign and magnitude of this correlation is dependant on the regulation type (i.e. activator, repressor or dual) as well as on the regulatory motif (i.e. feed forward, time-shifted, etc.). Therefore, we conclude that using only positive or only negative correlations diminishes the ability to capture true TF -TG relationships and better results emerged when all correlations are used in an absolute context. Importantly, irrespective of which set was used, the evidence becomes more apparent when selecting TF -TG relationships above average. This analysis also showed that as TFs have more TGs the ratio of correlation of the expression of the TFs with its TGs compared to all genes converges to one ( Figure 2D).

Simulation analyses reveal the most likely regulation type
As discussed by Yu et al. [21], there are two main reasons for regulation type to impact on the relationship of the expression of their targets. One is that a sizeable proportion of TFs act both as activators and repressors, in some cases for the same target. The other is that the combined effect of multiple TFs can have an unpredictable effect on target expression. Figure 3 illustrates the results from our simulation analyses. One prominent feature is that in order to observe a relationship between co-expression and co-regulation there must be a sizeable proportion of TFs acting as either activators or repressors, but not both. In fact, the relationship between co-expression and co-regulation quickly diminishes with increasing proportion of TFs with bipotential activity (activators and repressors) ( Figure 3B). Most interestingly, in the extreme scenario where all TFs have bipotential activity, no relationship between co-expression and co-regulation could be observed and the resulting distribution of the correlations would be perfectly centred at zero ( Figure 3A, density shown in black). Instead, such density sees its mass shifted towards the positive space with an increasing proportion of TFs having a single regulation type ( Figure 3A, densities shown in colours other than black). We conclude that the higher reliability attributed to positive correlations is a phenomenon of the presence of significant number of TFs that act as either general activators or general repressors and that the co-expression to co-regulation pattern observed from the real expression skeletal muscle dataset is consistent with the presence of 70 to 80% of TFs having a bipotential activity ( Figure 2C for real Table 1 Relationship between the expression of a TF and its predicted TG: Ratio of the average correlation between TFs and its TGs over the average correlation between same TF and all the remaining 9,241 genes in the dataset  data versus Figure 3B, purple and pink trends, for simulated data). In agreement with our findings, while on a smaller scale, the recent work of Ouyang et al. [34] with mouse embryonic stem cells, revealed that a remarkably high proportion of variation in gene expression can be explained by the binding signals of 12 TFs of which 7 (or 58%) serve as either activator or repressor depending on the target.

Co-expression as a function of Transcriptional regulatory motifs (TRM)
We used the log-odds ratio (LOD) to investigate the relationship between the co-expression correlation observed for genes pairs and the number of shared TFs. We observed that the LOD-value is dependent on the type of transcriptional regulatory motifs (TRM n ) defining the motif of n common TFs jointly regulating the same set of TGs, and with n = 0, 1, 2,... up to n ≥ 10 (Additional File 2). These results corroborated and, to a degree, formally validated our previous observation that gene pairs sharing TFBS showed an increased co-expression correlation ( Figure 2A). A similar result was observed for gene pairs sharing from more than 1 to more than 10 TFBSs (Additional File 2). Again, the trend is more pronounced when only positive correlations are considered, in line with a higher reliability for positive correlations [32]. As expected, the distribution of the co-expression correlations for pairs of genes with TRMs of 1, 5 or 10 genes shows an increased bias to highly positive correlations with increased size of the TRM ( Figure 4B). Likewise, extreme positive correlations (i.e. within the interval {0.8,1.0}) are more frequent among high-order TRMs than extreme negative correlations (i.e. within the interval {-1.0,-0.8}) (Additional File 2).

Derivation of a TF network for bovine skeletal muscle
We found 12, 13, 2, 9, 8, and 0 TFs whose TGs were enriched for genes encoding proteins involved in cell cycle, fat, immune, mitochondria, muscle/glycolysis, and the ribosome, respectively (Table 2 and Figure 4C-D). We used the chi-square test of independence to ascertain if there exists an independent pairing assortment in the resulting network of 127 TF and 306 edges. The null hypothesis of independence was rejected (P-value = 2.8359E-62) and we concluded that our module assignment of TFs provided information about the topology of the network. The TF arrangements were further explored by converting them into a network ( Figure 4A). This approach assigned links between TFs predicted to regulate more common targets than expected by chance alone. In order to analyze the resulting network among 333 TF connected by 1,395 edges, we focused on TF hubs in the context of the functionally coherent modules ( Figure 4B). In brief, we exploited the muscle-based co-expression modules of Hudson et al. [18] to determine if they represented a robust enough set of TGs to infer the regulation of the biological process in question using promoter data only. To illuminate the relationship between co-expression and the sharing of TGs among TFs, we explored the network that resulted from linking TF pairs with absolute co-expression correlation greater than 0.9 and more than 100 TGs in common. This approach resulted in 10 TFs connected by 10 edges and with TGIF1 as the hub of the network ( Figure 4D).

Discussion
Co-expression versus co-regulation and implications for skeletal muscle The resulting non-linear pattern observed when relating the number of regulator partners and the number of TGs a given TF has ( Figure 1D) resonates with the recent findings of Bhardwaj et al. [35] for model organisms and humans and indicates that only a limited number of partners are required to regulate an increasing number of targets.
As co-expression correlation between a pair of TGs increases, so does the number of common TFs ( Figure  2C). Importantly, this relationship continues to trend downwards in negatively co-expressed pairs of genes. Our prior, and in hindsight naïve expectation, was that pairs of genes with extreme negative correlations would share a high number of TFs (overlaid red trend in Figure  2C). Our prior expectation was based on the redundancy mechanism by which one of two genes involved in the same process (and hence regulated by the same TF) suffices at any point in time. However, the observed trend could also be reasoned from a mechanistic standpoint such that in order for a gene pair to show a strong negative co-expression, then their regulators must be vastly different (i.e., these two genes are more likely to be regulated by a completely different set of TFs). To elucidate the most likely speculation, the exploration of various regulation types in the TF -TG relationship in terms of activation, repression or bi-potential, was warranted.
Our analyses also revealed that as TFs have more TGs the ratio of correlation of the expression of the TFs with its TGs compared to all genes converges to one ( Figure 2D). This is consistent with the expectation that the more genes a TF regulates, the higher the likelihood that their expression will be also regulated by other TFs. An alternative explanation is that motifs comprised of a single TF regulating many TGs (often refereed to as SIM for single input motifs) can generate temporal programs of expression, in which TGs are activated one by one in a defined order [36,37]. Such temporal order of expression would imply that, at any given time point, the TF -TG relationship could be observed only for some of the TGs.
Pre-eminent among the top 20 TFs with the highest co-expression correlation with their predicted TGs are five HOX genes (HOXB4, HOXC9, HOXA1, HOXC8 and  HOXC10). The myogenic regulatory factors (MRF) MYOD1, MYOG, MYF5 and MYF6 are generally not as well-represented as we anticipated, although MYOG is in 18 th place. Treating the positive and negative correlations separately improves the representation of the MRF significantly, in that both MYOD1 (17 th place) and MYOG (3 rd place) subsequently feature. We have previously  Table 3; (C) Subnetwork of module-specific and muscle-expressed TFs; (D) Network among TFs with absolute correlation (r) >0.9 and number of common TGs (N) >100. Black and red edges correspond to positive and negative correlations, respectively. The network reveals the central role of TGIF1 which also contains a binding motif in the promoter region of myostatin (MSTN). Colors represent functional modules: cell cycle (green), fat (yellow), immune (purple), mitochondria (cyan), and muscle/glycolysis (red). Big and small nodes represent TFs with and without detectable expression in muscle, respectively. determined that of the various MRF only MYOD1 and MYOG make it onto an "Always Correlated" network that joins genes exhibiting strong co-expression.
With a special focus on the promoter regions shared by the core muscle structural proteins we determined enrichment for TF predicted to regulate those 66 molecules designated as either "fast", "slow", "embryonic" and "other muscle structural" as listed in Table 2 of Hudson et al. [18]. The strongest enrichment for this process corresponded to gene ontology term "response to retinoic acid", a feature shared by MZF1 (first with 41 promoter hits), TGIF1 (second with 35 hits) and MASH1 (fifth with 22 hits). Other TFs with strong hits included FOXM1 (tenth with 14 hits), a generic cell cycle regulator. The presence of a cell cycle regulator is intriguing given that a key feature of cells progressing through the myogenic program is cell cycle arrest.
There was some evidence for enrichment of the TGF-β signaling pathway, with predicted SMAD2 and SMAD3 binding sites featuring in the upstream regions of a number of targets. MZF1 has been reported to bind to FHL3 (a component of the muscle z-disc) [38], which in turn has been reported to signal via the SMADs [39]. Expression of TGIF1 is induced by TGF-β, has a predicted binding site upstream of MSTN, and interacts with SMAD proteins [40].

A TF network for bovine skeletal muscle
The P-matrix was subjected to hierarchical cluster analyses. This approach clustered TFs predicted to regulate common TGs, and TGs predicted to be regulated by common TFs. We observed a strong clustering of a set of HOX genes (HOXA1, HOXB13, HOXB4, HOXB9, HOXC1 and HOXD13), which also included the homeobox gene, UNCX. This clustering probably relates to the simple fact that many of the HOX genes are physical neighbors on the genome, formed by relatively recent gene duplication events. In this sense, they are a rare example of physical proximity and shared functionality in eukaryotic genomes [41]. We also observed a cluster of mitochondrial proteins including NDUFA11 near COX6A2 and COXA5, although given the size of the data this may have been expected by chance alone.

The impact on the functionally coherent modules
The lack of enrichment for the ribosome module is consistent with ribosomal proteins being ubiquitously expressed in all tissues and ribosomal RNA accounting for~60% of total RNA in the cell [42]. Overall, this approach enriched for some of the known, experimentally-validated regulators, with the cell cycle module performing particularly well. Given our 'promoterome' approach was predicated on first screening for highly conserved cis-regulated modules across mammals, our particular success with the cell cycle may reflect the observation that the cell cycle is more evolutionary conserved than various other processes [43]. For example, the behavior of E2F1, a causal regulator of cell cycle activity [44], recapitulated both here and by high co-expression alone is conserved across groups including C. elegans and yeast.
In addition, amongst the module-specific and muscleexpressed TFs (bold-type in Table 2) we highlight ARID5B and FOXM1, both with cell cycle genes significantly enriched among their targets. ARID5B is a member of the AT-rich interaction domain family of TFs, which are known to be critically involved in the regulation of development and cellular differentiation [45]. Recent studies have reported a functional role for FOXM1 in cell cycle [46] and for NFKB1 in muscle [47]. Interestingly, FOXM1 expression is decreased after the induction of fibroblast to myofibroblast transdifferentiation by TGF-β [48].
Equally suggestive is the significant enrichment of genes from the mitochondrial module among the lists of predicted TGs for MEF2C. Van Oort et al. [49] showed that a subset of the genes activated by a dominant negative MEF2C construct in mice encoded proteins localized primarily to, or functioning at, the level of mitochondria. In addition, RNAi silencing of MEF2C expression in the cardiac muscle of mice reduced mitochondrial DNA levels [50]. The nuclear factor 1 (NF1) family of TFs includes NFIX which has been found to act as a transcriptional switch from embryonic to fetal skeletal muscle development [51].
Only three TFs (ATF6, MEIS1 and NEUROG1) changed module allocation depending on the criteria used Ribosome 0 None *TFs in bold-face correspond to module-specific TFs that are significantly expressed in muscle ( Figure 3C). **ATF6, NEUROG1 and MEIS1 are assigned to different modules depending on the statistical test employed, hypergeometric or odds ratio. For all other TFs, there was a perfect agreement between the two criteria ( Table 3).
(hypergeometric test or odds ratio; Table 2 vs Table 3). ATF6 was allocated to cell cycle according to the hypergeometric test and to the fat module according to the odds ratio. ATF6 is a master regulator of the unfolded regulator response and has been implicated the regulation of lipid metabolism [52]. However, in this work the regulation was via a protein-protein interaction with SREBP2 without DNA binding. Similarly, MEIS1 was allocated to the muscle/glycolysis and to the immune module according to the hypergeometric and the odds ratio tests, respectively. MEIS1 belongs in the family of homeobox genes which play a crucial role in many developmental processes. A search of the literature indentified MEIS1 as a regulator of the immune system [53][54][55] and myogenesis [56]. GATA-binding protein 2 (GATA2) was among the TFs whose TGs were enriched for the fat module genes. Tong et al. [57] discovered that GATA factors play a key role in adipogenesis by suppressing promoters of adipogenic factors including peroxisome proliferatoractivated receptor gamma (PPARG).
While the muscle module did not enrich for the MRFs, the presence of SMAD2 and SMAD4 does implicate the TGF-β pathway, known to regulate muscle mass in mammals. The presence of HIF1A, a hypoxia responsive TF, does seem reasonable in a module that contains many glycolysis genes i.e. an alternative energy conversion pathway used in the absence of enough oxygen availability.

A network linking the GLI transcription factors with MYF5
An additional correctly inferred regulator of the muscle/ glycolysis module genes is GLI1 [58]. Inspired by the work of McDermott et al. [58], who showed that Gli proteins directly control MYF5 expression in mouse muscle progenitor cells, we used our TF network to determine the links between the Gli TFs (GLI1, GLI2 and GLI3), and the myogenic regulatory factor MYF5 ( Figure 5A). For comparison, Figure 5B shows the network among the same TFs in human and based on FUNCOUP tool [59]. In our landscape GLI2 was directly linked to MYF5, with one gene in common, and GLI1 was linked to MYF5 via HNF4A. The association of GLI2 and MYF5 in the two datasets are based on quite different types of relationships; GLI2 is required to activate the expression of MYF5 [58], whereas our analysis is based on the predicted co-regulation of target genes by GLI2 and MYF5. The correspondence in the association between MYF5 and GLI2 also provides support that our approach is a useful addition to the toolbox of analysis methodologies to be used in the analysis of gene expression data.

Conclusions
This work represents the first genome-wide analysis linking tissue-specific co-expression patterns with cisgenomic logic in bovine, a non-model vertebrate. The main finding is that extensive promoter sharing between genes culminates in a detectable and robust co-expression signal, significantly above that which would be expected by chance alone. One pivotal implication is that a sizeable proportion of regulators acting as either activators or repressors, but not both, is needed in order to observe a relationship between co-expression and co-regulation. Finally, we have shown that mining the relationship between promoter sharing and co-expression in the context of functionally coherent modules allows for the identification of key regulators of those modules.

Gene expression dataset
We use the gene expression data from Hudson et al. [18] profiling the genome-wide expression in bovine LTL muscle sampled across 26 experimental conditions as follows: ten developmental time points (3 pre-natal, birth and 6 post-natal) across two beef cattle breed crosses (Piedmontese × Hereford and Wagyu × Hereford) and three time points throughout a nutritional deprivation and re-alimentation experiment comprising 3 adult time points for each of the two treatments  (hence, 6 experimental conditions comprised of 3 time points × 2 nutrition treatments). The entire gene expression dataset comprising 48 microarrays has been deposited into Gene Expression Omnibus http://www. ncbi.nlm.nih.gov/geo/ and can be accessed using accession number GSE25554. Following previously described approaches [60], we fitted the following ANOVA mixed-effect model to normalize the gene expression data: where Y ikvmn represents the n-th background-adjusted, base-2 log-intensity from the m-th gene (m = 1, 2, ..., 13094 probes), at the v-th experimental condition variety (v = 1, 2, ..., 26) taken from the i-th array (i = 1, 2, ..., 48 microarray chips), and k-th dye channel; μ is the overall mean; C represents a comparison group fixed effect defined as those intensities that originate from the same microarray slide, printing block and dye channel; G represents the random gene (or probe) effects with 13,094 levels; AG, DG, and VG are the random interaction effects of array × gene, dye × gene, and variety × gene, respectively; and e is the random error term. Using standard stochastic assumptions, the effects of G, AG, DG, VG and e were assumed to follow a normal distribution with zero mean and between-gene, between-gene within-array, between-gene within-dye, between-gene within-variety and within-gene components of variance, respectively. Restricted maximum likelihood estimates of variance components and solutions to model effects were obtained using VCE6 software ftp://ftp.tzv.fal.de/pub/ vce6/. The solutions to the VG effect were used as the normalized mean expression of each gene (or probe) in each of the 26 experimental conditions under scrutiny.

Promoter sequence analysis
The genome-wide promoter sequence data for bovine was downloaded from Genomatix database http://www. genomatix.de/ (ElDorado Btau 4, v-07-09). A total of 60,131 promoter sequences derived from 22,050 genes were downloaded. We introduced several filtering steps to ensure only high confidence promoter sequences are selected for the analysis. First, we applied the concept of orthologous promoters [61] and retained only those promoter sequences for which phylogenetically conserved promoter sequences were documented in the human and mouse genomes. Using this criterion we retained 39,696 promoter sequences distributed over 13,623 genes. In the next step, we applied a threshold of 1 (100% confidence) to core and matrix similarities [62]. These editing criteria resulted in a total of 310,316 high confidence TFBS that were used for integration with the gene expression data.

The Promoterome Matrix
We built a 'Promoterome' matrix (P-matrix) relating predicted TGs with TFs based on TFBSs located in their promoter regions. The rows of P-matrix correspond to TGs for which expression data is available, and the columns correspond to TF genes retrieved from the Genomatix database. We identified 9,242 genes (rows of P-matrix) whose promoter sequence (or sequences) harbours at least one predicted TFBS corresponding to at least one of the 333 TFs (columns of P-matrix). The element P(i,j) of P-matrix is set to "1" if a promoter region of the i-th TG contains a TFBS corresponding to the jth TF, otherwise is set to "0". Among the 333 TFs represented in the columns of P-matrix, there were 178 with expression in the Hudson et al. [18] dataset, including 143 with promoter region information in Genomatix (i.e. these 143 TFs were also represented among the rows of P-matrix). We used PermutMatrix [63] to visualise and analyse the resulting P-matrix (See Additional file 1). In addition, the cross-product matrix resulting from multiplying the P-matrix by its transpose, results in a square and symmetric matrix, P T P, of dimension equal to 333 (i.e. the number of TFs). Diagonal values of P T P matrix (P T P(j,j)) correspond to the number of TGs for the j-th TF. Off-diagonal values of P T P (P T P(j,j')) correspond to the number of promoter regions in which the TFBS for the j-th and the j'-th TF co-occur. Co-occurrence is used to build a network of TFs. The hierarchical tree shows a pattern consistent with the non-random assortment of the connectivity distribution with most TFs having few TGs and few TFs having lots of TGs and consistent with a scale-free power-law distribution [64,65].
Statistical significance in the co-expression to coregulation relationship Following Yu et al. [21], we used the log odds ratio (LOD) to ascertain the enrichment of a particular TF -TG relationship with respect to random expectation for the occurrence of the observed co-expression and according to the following formulae: where P(co-expression/regulation) is the probability of gene pairs with certain regulatory relationship (e.g. sharing the same predicted TFBS) showing a specific correlation co-expression; and P(co-expression) is the probability of randomly selected gene pairs having the same co-expression correlation. LOD values above zero signify observations that are more common than expected by chance, and vice versa.

Simulation schema
In order to gain further insights into the mechanistic rules governing the observed relationship between coexpression and co-regulation, we generated a series of simulated datasets under various scenarios. Source code in MATLAB (The MathWorks, Inc.) was developed for the simulation schema. In our simulation process a given TF was allowed to act as "dual" (i.e., acting as both activator and repressor of its TGs) or "mono" (i.e., either activator or repressor of its TGs, but not both). In detail, our simulation process followed the following schema: Step1: Define the input vector D(k) representing the percentage of TFs with "dual" behavior. In detail, D(k) = (k-1) × 10, for k = 1, 2, ..., 11. That is, D was allowed to range from 0% to 100%, by 10% increment.
Step2: At the r-th iteration (for r = 1 to 93), select a total 100 TGs from P-matrix (i.e., 100 rows of P). Selection is at random and without replacement, ensuring with r = 93 that all rows of P were sampled.
Step3: Randomly assign the 333 TFs from the Pmatrix as follows: D(k) percentage of TFs are set with dual characteristic while the rest are set with 4 "mono" behaviors in equal amounts: activator, repressor, strong activator, strong repressor.
Step5: Simulate the expression data at 334 time points corresponding to Time 0 and up to Time 333 for each consecutive TF. In order to approximate the distributional properties of the real data, the expression data for the 100 TGs at Time 0 was simulated at random from a normal distribution with a mean of 9 and a variance of 9 (i.e. N(9,9)), and truncated at 0 and 18, as lower and upper bounds, respectively.
At Time t (t = 1, 2, ..., 333), the j-th TF (where j = t; hence as many time points as TFs) comes into action and the expression of the i-th TG is be modified according to the value of V. At Time 333, each of the 100 TGs has expression data at 334 dimensions (or time points).
Step6: Return to Step2 until the k percentage levels and all 9,242 TGs from P-matrix have been explored.
During the simulation process, we also considered the situation in which TFs came into action in groups of five (i.e. having 20 groups from the 100 TFs considered in each iteration). However, the results did not differ from those obtained using the one-at-a-time scenario described above.

TF network
The strength of the similarity existing for a given TF pair can be inferred from the number of common TGs. We used the columns of P-matrix as the input for the PCIT algorithm [8] to generate a network of TFs. The resulting network contained 333 nodes (i.e. as many as TFs) linked by 1,395 edges. Edges in the network link TFs predicted to share a significantly large number of targets. Consistent with a non-random assortment in the connectivity distribution, the 10 most connected TFs (i.e.~3% of the total 333 TFs), referred to as 'hub' TFs, had at least 17 connections each, and were connected to 196 TFs (i.e.~60% of the total).

Gene modules and significance of TF to module assortment
The functional modules that emerged in the landscape of Hudson et al. [18]were subjected to further scrutiny to generate a curated list of module genes (Additional file 3). This additional examination was based on gene proximity in the hierarchical cluster analysis according to the PermutMatrix software [63] as well as on the molecular function gene ontology term http://www.geneontology.org.
We focused our attention on six modules as follows: cell cycle, fat, immune, mitochondria, muscle/glycolysis, and the ribosome. These modules were chosen for their likely roles in the determination of muscle mass, intramuscular fat development and energetic efficiency.
The enrichment of the affinity between the TFs and the functionally coherent modules was explored by means of the hypergeometric test of significance. To this respect, for every TF -Module combination, we calculated the probability of having the observed number of module genes among its targets using the following hypergeometric equation: where N is the total number of genes (9,242), n j is the total number of TGs for the j-th TF, m k is the number of genes in the k-th module and m j is the number of TG of the j-th TF belonging to the k-th module.
Those TFs with H(j,k) <5% and a proportion (m j /n j )/ (m k /N) >1 are referred to as 'module-specific' TFs. The second condition represents an odds ratio and was applied to account for low hypergeometric probabilities resulting from under-enrichment (i.e. those TFs with a proportion of module genes among their targets less than the proportion of module genes among all genes).
Finally, among 'module-specific' TF, we classify as 'module-specific and expressed' TF to refer to those for which expression data is available. These criteria resulted in 44 'module-specific' TFs of which 32 were 'module-specific and expressed' TFs ( Table 2).
The odds ratio criteria resulted in 127 out of 333 TFs being allocated to one of the six modules (Table 3). In order to ascertain if this module assignment of TFs provided information about the topology of the network we tested if there exists an independent pairing assortment in the resulting network of 127 TFs and 306 edges using the chi-square test of independence (P-value = 2.8359E-62).