Massive genome-wide characterization projects have dramatically transformed our current view of genes and other elements of the genome . The picture emerging is of a complex regulatory landscape in which multiple actors coincide to perform distinct roles that are fundamental for the appropriate deployment of cellular gene expression programs . Transcription factors (TFs) are protein adaptors that recognize particular regulatory sequences (TF binding sites, TFBSs) in the genome to target the assembly of other protein complexes that ultimately govern gene expression . In fact, precise information about when and where a gene is transcribed is encoded on the sequence and the structure of the genome.
At the sequence level, promoters are regulatory regions located immediately upstream of the gene, which anchor the RNA polymerase transcriptional machinery to the transcription start site (TSS), while enhancers conduct more precise tissue-specific gene expression and can be physically displaced up to hundreds of kilobases from their target. Both promoters and enhancers are non-coding sequences in which multiple TFBSs of about 5 to 15 bp are distributed following a modular organization. Such cis-regulatory modules (CRMs) act as genetic switches and are bound by specific TFs to drive distinct patterns of expression. Comparison of binding landscapes across multiple species have revealed that these functional regulatory regions tend to be highly conserved throughout evolution in many cases [4–6]. The predominant model establishes that direct contact between both enhancers, promoters and TFs, through DNA looping orchestrates the RNA polymerase recruitment to initiate the transcription of the neighbouring gene .
At the structural level, chromatin packaging into nucleosomes dynamically shapes the genome, producing a landscape of open and closed regions that can eventually show or mask different pieces of information encoded within the sequence . By interpreting a collection of post-translational modifications of the histone tails at the surface of nucleosomes, chromatin remodeling complexes can force a repositioning of such structural units, resulting in a change in the local conformation of a particular area . Consequentially, modifications in the chromatin structure may confine access of TFs to a subset of regulatory sequences along the genome . Recent studies on epigenomics have unveiled the existence of chromatin signatures that are helpful to distinguish promoters and enhancers from other genome elements [11–13]. Thus, while active gene promoters are in general marked by trimethylation of lys4 of histone H3 (H3K4Me3), distal enhancers are associated with monomethylation (H3K4Me1). However, functional enhancers for active genes exhibit additional acetylation of lys27 of histone H3 (H3K27Ac), while trimethylation of lys27 (H3K27Me3) denotes poised enhancers that are linked to inactive genes [11, 13].
Deciphering the map of regulatory sites and regions that shape the genome is therefore a formidable challenge of major interest, for which computational methods that identify such features can be extremely helpful. Most bioinformatics protocols for regulatory prediction consist in the application of two steps (reviewed in [14–16]): (i) sequence analysis in search of consensus sites derived from catalogs of predictive models or motif discovery approaches; and (ii) evaluation of such predictions, taking into account evolutionary conservation across other species. Recently, additional epigenomic information about histone modification maps has been integrated into other approaches, and this significantly outperforms previous strategies [17–19]. In the last decade, a myriad of bioinformatics solutions have been published that deal with the problem of mapping putative TF sites and predicting regulatory regions (see  for a comprehensive listing). As a consequence, scientists must face a plethora of heterogeneous tools in order to characterize a regulatory region, including, among others, genome browsers [21, 22], multiple genome alignments [23, 24], catalogs of functional sites [25–27], software suites of prediction [28, 29], and genome-wide epigenetics profiles [30, 31]. Even though unquestionable progress is observed in this issue, through integrative initiatives such as Galaxy , this complex mixture of applications and databases often constitutes an obstacle for basic researchers, who are actually the potential target audience demanding this information. The minimal computational expertise that is required may be prohibitive for many experimentalists, denying them access to this knowledge that could expedite their investigations at the lab bench .
Research on transcriptional regulation in Drosophila melanogaster, one of the most intensively studied organisms in biology, is a case in point. In fact, the sequencing of other flies  offers a formidable opportunity to decipher the common regulatory circuitry of these species. This information is fundamental when conducting experimental research to elucidate potential relationships between regulators and their targets . More recently, the modENCODE project has released more than 700 genome-wide datasets for dozens of TFs, histone marks, and other regulatory features that promise to drastically push the field of characterization of Drosophila gene regulatory regions forward in the next decade . By and large, fly researchers can work with many resources that provide inestimable access to such information: FlyBase is the major repository of genetic and genomic information on the fruit fly , FlyMine is a web platform that integrates external genomics and proteomics resources under the same query interface , and modMine provides access to modENCODE data . Specifically for the characterization of TF binding sites, the information is distributed into different resources: FlyTF is a comprehensive catalog of TFs with DNA-binding properties , while REDfly, FlyFactorSurvey, and the DNase footprint database are compilations of TFBSs experimentally validated in Drosophila[39–41]. Moreover, Jaspar and Transfac repositories include about 100 predictive models derived from the literature for Drosophila[25, 26].
However, although important efforts are being done to standardize the construction of large-scale collections of regulatory sites [6, 42], it is not trivial for a bench biologist to understand how to deal with this massive volume of information (for examples, see [43, 44]). As a result, there is a need for easy-to-use web integrative resources that perform comparative regulatory analyses on emerging next-generation sequencing data. We present here CBS (Conserved regulatory Binding Sites), an open regulatory platform that offers, under an intuitive graphical interface, a comprehensive map of evolutionarily conserved binding sites and enhancers identified in Drosophila, using a combination of predictive and alignment methods with epigenomic information. Through the introduction of user custom tracks for most popular genome browsers, CBS makes visualization of these regulatory features extremely simple for inexpert users. We demonstrate how CBS can be particularly useful for characterizing functional sequences and conducting in silico regulatory screenings of target genes reported in high-throughput expression experiments.