Protein composition of interband regions in polytene and cell line chromosomes of Drosophila melanogaster

Background Despite many efforts, little is known about distribution and interactions of chromatin proteins which contribute to the specificity of chromomeric organization of interphase chromosomes. To address this issue, we used publicly available datasets from several recent Drosophila genome-wide mapping and annotation projects, in particular, those from modENCODE project, and compared molecular organization of 13 interband regions which were accurately mapped previously. Results Here we demonstrate that in interphase chromosomes of Drosophila cell lines, the interband regions are enriched for a specific set of proteins generally characteristic of the "open" chromatin (RNA polymerase II, CHRIZ (CHRO), BEAF-32, BRE1, dMI-2, GAF, NURF301, WDS and TRX). These regions also display reduced nucleosome density, histone H1 depletion and pronounced enrichment for ORC2, a pre-replication complex component. Within the 13 interband regions analyzed, most were around 3-4 kb long, particularly those where many of said protein features were present. We estimate there are about 3500 regions with similar properties in chromosomes of D. melanogaster cell lines, which fits quite well the number of cytologically observed interbands in salivary gland polytene chromosomes. Conclusions Our observations suggest strikingly similar organization of interband chromatin in polytene chromosomes and in chromosomes from cell lines thereby reflecting the existence of a universal principle of interphase chromosome organization.


Background
Genetic activity of interphase chromosomes is intimately linked to the properties of chromatin organization. At a very basal level, chromatin is organized in nucleosomes, histone octamere/DNA complexes. These, in turn, form higher-order structures, such as chromomeres, loops, domains, etc. Clearly, key to this organization are the chromatin proteins: histones, their post-translational modifications, and non-histone proteins. Modern methods help reliably address the question of interphase chromatin organization at a nucleosomal level, however details of higher-order chromatin organization still remain obscure. This is largely due to our inability to directly visualize the supra-nucleosomal structures in diploid interphase nuclei. Giant polytene chromosomes from dipterans, in particular from Drosophila, allow one to mitigate this problem.
"Classic" polytene chromosomes from larval salivary glands of D. melanogaster are composed of bundles of one to two thousand tightly synapsed chromosomal strands, which are formed via multiple rounds of endoreplication of just two starting chromatids. As all the homologous chromomeres from all chromatids are aligned to each other, this results in the formation of a thick "cable" with transverse stripes of compacted chromatin (bands) alternating with decompacted interchromomeric regions (interbands). Local differences in size and compaction of banded material form a unique banding pattern that can be used to accurately map any polytene chromosome region. This in turn allows one to link a particular DNA sequence, genes and proteins to the specific chromosomal region, and so to spatially analyze the genetic processes taking place in the interphase nucleus (for review: [1]).
According to different estimates, there are 3500-5000 bands and interbands in Drosophila melanogaster polytene chromosomes; these comprise about 95% and 5% of euchromatic DNA, respectively. On average this corresponds to 30 kb of genomic material per band and 2 kb per interband [2][3][4]. Obviously, the vast majority of genes are situated in bands, as they encompass most of the DNA. As a rule, the degree of chromatin compaction in bands correlates with their transcriptional activity. This is manifested most clearly in case of puffing, i. e. when upon gene activation bands form puffs. Despite the fact that interbands are also represented by decompacted chromatin, their genetic organization and functions are still largely enigmatic. Several hypotheses regarding the functions of interbands were put forward in the literature (for review: [4]), but can be essentially reduced to just two alternatives. Namely, the interbands correspond to active genes. Or, interbands harbor regulatory regions for genes that are found in the neighboring bands. Neither of these scenarios had been adequately addressed experimentally.
Despite this plethora of interesting chromatin proteins linked to interbands, their very cytological mapping is not accurate enough, as it is quite challenging to reliably map the protein localization signal to a fine structure of an interband, at least at the resolution level of light microscopy.
Clearly, in order to address the functions of interbands, it is important to be able to accurately map interband regions on a physical map and then to analyze the protein binding profiles and chromatin features in these regions. Unfortunately, using standard mapping techniques, it is close to impossible to precisely map DNA sequences to interbands as their axial lengths are quite small (0.12 mkm on average) [2]. To solve this problem, one must develop new approaches to mark and identify interband regions. P-element insertions could serve as such useful "markers". Using electron microscopy (EM) analysis of polytene chromosomes from stocks with Pelement-based insertions, our group has previously shown that such insertions can be visualized on polytene chromosomes as distinct cytological structures [28,29]. In most cases, transcriptionally silent chromatin in such transgenes becomes compacted and forms novel bands, provided that insertions occurred into interbands. When inserted into bands, the compacted material from a transgene typically fuses with the neighboring material and does not form a separate band ( Figure 1). As the transgene sequence is known, cloning the DNA sequence adjacent to the transgene insertion is straightforward, and so one can unambiguously identify the sequences that belong to interbands [30][31][32].
Using this approach, we mapped and cloned the DNA from 13 interband regions. We found that these interbands were mainly composed of non-coding intergenic regions and 5'-UTRs. Also, many of the interbands were rich in DNase I hypersensitive sites (DHSs), which turned out to behave as "hot spots" for integration of Pelement based transgenes [33].
With these observations in hands, we decided to further explore the question of functional organization of interbands. First of all, we wanted to establish which proteins were specific to the interbands' open chromatin, and then to ask whether localization of some of these proteins could be correlated on a genome-wide scale. Obviously it was of utmost importance also to understand whether the interbands from polytene chromosomes were "mirrored" by analogous regions in chromosomes from cell lines. Also, in order to address the question of existence of a defined molecular border between bands and interbands, it was interesting and necessary to estimate the length of DNA sequences associated with such proteins. To tackle all these questions, we analyzed the data from Drosophila genomewide protein mapping databases, mostly those from NHGRI modENCODE project [34] and from Filion with co-authors [35]. These projects included comprehensive genome-wide analysis of a wide array of chromatin proteins and histone modifications from D. melanogaster cell lines. As a result, 5 [35], 9 and even 30 [36] distinct chromatin types were identified, which were characterized by specific combinations of classes of genes and associated proteins.
Using the abovementioned data obtained on interphase chromosomes of cell lines, in the present work we performed comparative analysis of thirteen interband regions from polytene chromosomes searching for the proteins specifically enriched in interbands. Vast Figure 1 Morphology of P-element insertions in polytene chromosomes. Possible scenarios: A -transgenic insertion into the interband results in formation of a novel band; B -electron microscopy image of the region 84E from chromosome arm 3R of wild-type (top) and transgenic for cHBΔ (bottom) larvae. Transgenic material forms a novel band (black arrow), which is absent from the chromosomes in control stock (white arrow); C -transgenic insertion does not result in formation of a novel band; D -electron microscopy image of the region 12E of chromosome X from wild-type (top) and cHBΔ transgenic (bottom) larvae. Chromosome morphology remains unaltered (black arrow) in the transgenic strain as compared to the wild-type chromosome (white arrow). Some marker bands are shown by arrowheads. Bar corresponds to 1 mkm. majority of interbands studied was found to associate with a set of proteins that is typically found in open chromatin. These open chromatin proteins tended to localize to low nucleosome density and histone H1depleted regions and to correlate with binding of ORC2, a pre-replication complex protein. Our data suggest that regions possessing most of these features combined are typically smaller than 3-4 kb in length, and that the number of such regions closely matches the estimated number of cytologically distinct interbands in polytene chromosomes. Furthermore, our data demonstrate that interband chromatin is similarly organized in different cell types, thereby suggesting its participation in general processes that serve to form and maintain the functional architecture of interphase chromosomes.

Results
Open chromatin proteins and histone marks are found in the cell line chromosome regions that correspond to polytene chromosome interbands Distribution profiles for several dozens of proteins and histone marks in several D. melanogaster cell types have been established through the efforts of modENCODE project [34]. We used these data and other chromatin features and focused on the regions that correspond to 13 previously mapped interband regions from polytene chromosomes [31,33]. Specifically, we used modEN-CODE ChIP-chip datasets for S2 cells and in some instances Kc167 cells, which were generated for 18 histone modifications and 25 chromatin proteins belonging to different functional classes. Notably, band/interband transition points remain presently unknown, and interband size estimates also vary quite widely from 0.3 to over 3.8 kb [1,37]. Thus, we compared binding profiles for these proteins over 10 kb regions centered around insertion sites of reference transgenes which were mapped to the interbands studied and used to clone respective DNA sequences (Additional file 1 Figure S1, Additional file 2 Table S1). Figure 2 illustrates that in cell lines most of the 13 regions analyzed (80-100%) associate with open chromatin proteins. Notably, most of these proteins show significantly lower levels of the distribution in control sets of random DNA sequences of equal size from the D. melanogaster genome or from three large molecularly mapped bands 10A1-2, 75C1 and 75C2 (Figure 2, Additional file 2 Table S4) [38,39]. Of these open chromatin proteins, RNA polymerase II, CHRIZ, ORC2, GAF, BEAF-32, CP190, TRX, as well as H3K9ac, H4K16ac and H3K4me3 were previously reported to partially or completely immunolocalize to interbands (for review: [4]). The rest of the proteins -WDS, dMI-2, NURF301, BRE1, H3K4me2/3 and H4K16ac were known to contribute to chromatin remodeling and transcriptional regulation. We failed to observe H3K4me3-LP and tetra-H4ac in interband regions, even though these histone marks were reported as present in transcriptionally active chromatin (Supplementary Figures 11-12 from [36]. We attribute this to the quality of H3K4me3-LP antibody: despite H3K4me3 (affinity-purified) and H3K4me3-LP (crude serum) show overall very similar distribution profiles (Additional file 1 Figure S1), the latter antibody rarely displays enrichment above the significance threshold defined by modENCODE. Another peculiar feature of the regions studied is that they very frequently (> 90%) encompass H1-dips ( Figure  2B, Additional file 1 Figure S1) -the regions depleted for histone H1 [40]. This linker histone is known to be the key protein in compacting the 10 nm chromatin fiber into 30 nm super-beaded form [41]. Therefore, presence of H1-dips can be considered as a marker of open chromatin. It is interesting to note that the trends observed for proteins and histone marks associated with open chromatin over 10 kb were essentially the same even over 4 kb centered at insertion points of reference transgenes ( Figure 2). This might point to the possible functional interactions of said proteins in these regions of the genome. We next observed that 50-70% of the regions analyzed were also associated with HP1c, HP1b, JIL-1, dRING, H3K36me3 and H3K79me1. Finally, in the regions that correspond to interbands, in cell lines there was no or very little binding for typical "closed chromatin" (transcriptionally inert chromatin) proteins such as HP1a, PC, HP2b, MOD(MDG4), SU(HW), E(Z), SU(VAR)3-7, SU(VAR)3-9, H3K9me2, H3K9me3, H3K27me3, H3K23ac ( Figure 2).
We then analyzed in more detail the profiles for each of the chromatin proteins and histone marks, for P-element insertions and for nucleosome-depleted regions within ± 5 kb from insertion sites of reference transgenes in 13 interband regions. DNA sequences encompassing 1.5-4 kb around these sites were considerably enriched in many open chromatin proteins, such as RNA polymerase II, CHRIZ, ORC2, GAF, BEAF-32, CP190, TRX, WDS, dMI-2, NURF301 and BRE1. Furthermore, these same regions tended to display lower nucleosome density and served as hot spots for Pelement integrations (Figure 3, Additional file 1 Figure  S1). Of the histone marks that are characteristic of active chromatin, the following five were most frequently (50-100%) and widely (8-10 kb) found: H3K4me2, H4K8ac, H3K9ac, H3K4me1 and H4K16ac. In contrast to non-histone proteins found in active chromatin, the distribution of "active" histone marks is somewhat wider, with slight increase towards the edges of the sequences analyzed ( Figure 3). As it was mentioned above, in the interband regions studied, the enrichment for "inactive" marks is close to negligible; hence we failed to identify any peculiar features in their localization. Figure 4 demonstrates enrichment profiles for different functional classes of proteins over the regions of interphase chromosomes from cell lines that correspond to polytene chromosome interbands. Histone marks appear either widely enriched or uniformly distributed along the whole region, or slightly increasing towards the ends of the sequences. For most regions, non-histone proteins which mainly comprise markers of active chromatin are enriched over 1.5-4 kb around insertions sites of reference transgenes. However, in two instances, namely in interbands 60E8/E10and 87C8/9, -these enrichment regions are rather found next to the reference insertion sites. We interpret these data as the transgenic insertions hitting the very edge of an interband; alternatively this could be a consequence of distinct transcriptional activities in these regions in salivary glands and in cell lines.
Overall, the data presented here argue in favor of apparent protein-wise similarity in chromatin organization of 13 "true" interband regions studied in polytene chromosomes and of the corresponding regions of genome in cell lines.

Genome-wide analysis of proteins found in interband regions
To uncover the genome-wide localization characteristics for proteins that map to selected interband regions, we used GEO (Gene Expression Omnibus) datasets  Table S1. Position and orientation of underlying genes (as in FlyBase Genes r. 5.12) is indicated below as horizontal blocks and arrows. Y axis shows combined percentages of 0.5 kb long DNA segments found associated with a particular class of proteins (n = 20). Color-coding for such classes is indicated below. Vertical dashed lines delimit the regions most probably corresponding to interbands. available as gff-files at http://www.ncbi.nlm.nih.gov/gds. These files describe genomic regions significantly bound by most of the proteins assayed by modENCODE. We selected fragments with positive scores for non-histone proteins and H1-dips (Additional file 2 Table S3) for all Drosophila chromosomes and estimated their genomewide distributions and lengths of the fragments. Large fraction (70-95%) of these fragments, bound by either "active" or "silent" chromatin proteins, was 1 to 3 kb long ( Table 1). The number of fragments bound by "active" chromatin proteins, -RNA polII, CHRIZ, WDS, ORC2, H1-dips, GAF, CP190, BEAF-32, dMI-2, NURF301, BRE1, TRX, -and ranging 1-3 kb, is 3000-5300 (Table 1), which roughly corresponds to the observed number of interbands in polytene chromosomes [4]. On the contrary, there are far fewer fragments (760-2800) that are of similar size (1-3 kb) and are associated with "silent" chromatin proteins PC, E(Z), dRING, or with typical insulator components: CTCF MOD(MDG4), SU(HW) ( Table 1).
In order to estimate how frequently these proteins colocalize in D. melanogaster genome, we performed their pair-wise comparison. The number of overlapping pairs was considered as a similarity measure for every pair of factors being compared. Only the fragments that showed positive scores and which were smaller than 10 kb were considered. We calculated the number of unique paired overlaps between the fragments (Additional file 2 Table   S6) and so estimated the pair-wise correlation coefficients between the proteins (Additional file 2 Table S7). The highest values of correlation coefficients were observed for the "active" chromatin proteins and for proteins enriched in 13 interbands, i.e. for BEAF-32, CHRIZ, RNA POL II, ORC2, H1-dips, TRX, WDS, NURF301 and BRE1. The same was observed for "silent" chromatin group of proteins -MOD(MDG4), SU(HW), E(Z), dRING. To verify whether this co-localization is significant, we first fragmented the euchromatic part of the genome (120 Mb) into non-overlapping 3 kb-long blocks (the median size of fragments that are bound by these proteins (Table 1)). Then we analyzed each of these~40000 blocks for the presence of all pair-wise combinations of these proteins. As it is shown in Additional file 2 Table S8, the probability of independent pair-wise localization of all "active" proteins in interbands studied is fairly low (P-value < 10 -300 ). Figure 5A shows a multidimensional scaling plot (see Methods) of the correlations mentioned above. The "active" chromatin proteins characteristic of interbands cluster together and away from the cluster of "silent" chromatin proteins that do not map to interbands.
Using the agglomerative hierarchical clustering (AHC) approach, we estimated the co-localization frequencies for all the proteins. These formed 3 separate groups ( Figure 5B). First group comprised the "active" chromatin factors, such as BEAF-32, CHRIZ, H1-dips, RNA polII, ORC2, TRX and WDS, many of which were reported to immunolocalize to decompacted regions of polytene chromosomes. It is interesting to note that the numbers of pair-wise overlaps for the proteins from this group are fairly tight, ranging from 3300 to 3800 (3600 on average), which fits very well the number of interbands in polytene chromosomes [4]. Nucleosome remodeling proteins such as NURF301, dMI-2 and GAF also tend to co-localize with this group. The two remaining groups of proteins are represented mainly by Pc-G proteins -PC, E(Z), dRING and by insulator proteins, MOD(MDG4), SU(HW), CTCF, CP190, and surprisingly by BRE1. These proteins display low levels of co-localization frequency with the proteins from the first group, and so appear not to be present in interbands.

Discussion
Using genome-wide distribution data for a wide range of non-histone proteins and histone marks available for D. melanogaster cell lines [35,36,40], we analyzed the protein composition and chromatin features in genomic regions of cell line chromosomes corresponding to 13 interband regions of polytene chromosomes. Our results establish these regions as depleted for the linker histone H1 (showing H1 dips), and associated with a specific set of proteins characteristic of "active" chromatin (Figures 2 and 3). This is also consistent with the distribution of different states of chromatin in these genomic regions ( Figure 6, Additional file 1 Figure S1). Namely of the five principle states of chromatin that were previously identified in Drosophila cell lines and color-coded by Filion with co-authors [35], it is predominantly RED chromatin that we observe most frequently within 10 kb fragments encompassing interbands. This chromatin is reported as enriched in ORC binding sites as well as in regulatory sequences and mainly comprises genes which are linked to specific processes such as "receptor binding", "defense response", "transcription factor activity" and "signal transduction" [35]. The interband regions studied also contain YELLOW and BLUE chromatin  Figure 6A). Transcriptionally active YELLOW chromatin is specifically marked with H3K36me3, a mark of transcriptional elongation typically present on genes with a broad expression pattern over many developmental stages and tissues, so-called "house-keeping" genes. BLUE chromatin is mostly found in genome regions associated with Pc-G proteins and harboring developmental genes as well as many of the highly conserved non-coding elements (HCNEs) that contribute to gene regulation [35]. It is important to emphasize that the fraction of RED chromatin relatively to the rest of the chromatin types increases closer to the insertion sites marking interband regions (Additional File 3 Figure S2). At a 10 kb level, RED chromatin is 1.9 and 2.6 times enriched compared to the YELLOW and BLUE states, respectively, whereas when the regions ± 1 kb around insertion sites are considered, RED chromatin is 3.3 time more frequent. GREEN and BLACK chromatin states characteristic of genetically silent material (pericentric heterochromatin and transcriptionally inactive regions scattered over the genome, respectively) are very rarely found in interbands and if present tend to be located on the flanks ( Figure 6A, Additional File 3 Figure S2A). According to the 9-state model of chromatin organization in cell lines [36], the regions corresponding to interbands are mostly composed of state 1 and state 3 chromatin ( Figure 6B, Additional File 3 Figure S2B). State 1 chromatin is rich in promoters, TSSes and 5'-UTRs. State 3 chromatin is mainly characterized by the presence of large first introns in long genes, enrichment for specific chromatin remodeling factors (for instance SPT16 and dMI-2), presence of enhancers and early origins of replication. As compared to states 1 and 2, state 3 domains show stronger enrichment for transcriptionassociated histone variant H3.3 [36]. Despite some differences in approaches as well as in the proteins analyzed in [35,36], the regions that correspond to 13 interbands display consistent set of features. They are mostly represented by regulatory and promoter regions for the genes which appear to reside in the adjacent compacted material of bands (chromomeres).
Most of the "active" chromatin proteins that mapped in cell lines to DNA regions corresponding to interbands, are known to immunolocalize to interbands (for review: [4]). Therefore, it is plausible to suggest that the "open" chromatin feature and the localization of a specific set of proteins are inter-related, and in fact represent a universal principle of interphase chromosome organization. This conclusion is consistent with the highly detailed observations by W. Beermann, who compared banding patterns in four larval tissues of Chironomus, and who observed them to match perfectly except for minor differences at certain regions and differences due to puffing [2]. Similar work on Drosophila also described very minor changes in banding pattern [42]. Significant similarity in banding patterns was subsequently observed upon comparison of many different tissues from many insects (for review: [1,43]). That "active" chromatin is invariably present in interbands, is also supported by the similar pattern of DHSs in salivary gland polytene chromosomes and in embryonic cells. For instance, mapping of major DHSs on physical and cytological maps of the fa swb interband demonstrated their identical localization, length and number in the chromatin of embryonic cells, cell lines [44] and in larval cells [33]. This might help to explain high frequency of P-element integrations into interbands, as insertions tend to hit the regions of DHSs [32,33]. It must be emphasized that P elements transpose and integrate in diploid germline cells, and there are no reasons to believe that insertions sites are linked in any way to the gene expression nearby [45]. Within reference interbands, we observed P-elements to predominantly cluster around open chromatin regions ( Figure   Figure 6 Distribution of various chromatin states in 13 regions of D. melanogaster genome that correspond to interbands in polytene chromosomes. A -5 "colored" chromatin states according to [35]; B -9 chromatin states according to [36]. X axis shows sizes of DNA segments centered at the insertion sites of reference P-transposons. Y axis shows number of regions associated with a particular type of chromatin. 3), therefore this might suggest that these same DNA sequences are also organized in open chromatin in germline cells, where P-elements actually transpose and integrate.

Conclusions
Based on the genome-wide protein mapping data generated by modENCODE on D. melanogaster cell lines, and using previously mapped interband regions as a reference, we for the first time demonstrated that decompacted chromatin regions that appear as interbands in polytene chromosomes are organized the same way in other cell types and correspond to interchromomeres of interphase chromosomes in cell lines. The peculiarities of protein distribution identified for interband regions can serve as convenient markers to precisely map interbands to the molecular map, thereby allowing one to compile comparative molecular and cytogenetic maps of interphase chromosomes in different Drosophila cell types. Indeed, further experimental validation of band and interband regions on a larger scale should be helpful to firmly establish this conclusion. Using our approach, precise mapping of the band/interband positions across entire Drosophila genome is a subject of separate work which is currently underway.

Cytological Analysis of Polytene Chromosomes
Salivary gland polytene chromosome squashes were prepared for electron microscopy analysis and examined as described earlier [46]. The sections with a thickness of 120-150 nm were cut using an LKB-IV ultratome (Sweden) and examined with a JEM-100C (Japan) electron microscope at 80 kV. Transgenic fly stocks contain insertions of cHBΔ transposon, which is an 18 kb-long P-transposon encompassing D. melanogaster gene rosy and β-gal from E. coli [47].

Genomic analysis
ChIP-chip data files for chromatin proteins and histone modifications from Drosophila cell lines (Additional file 2 Table S2) were downloaded from modENCODE consortium website [48]. The coordinates of chromatin domains determined elsewhere [35] were extracted from NCBI Gene Expression Omnibus [49], accession number GSE22069. Centers of 12 interbands (dm3 assembly) coincided with the integration sites of P transposons used to map respective interbands; for the interband 3C6/C7, proximal border of deletion fa swb [50] was selected as a central point. The coordinates of P-transposon insertion sites (Additional file 2 Table  S1) were downloaded from FlyBase [51] (release FB2010_01).
To check whether 18 proteins might cluster throughout the whole genome, we performed pair-wise comparison of these regions and counted the number of overlapping pairs as a similarity measure for every pair of binding regions. Only the fragments with positive scores shorter than 10 kb were considered (Additional file 2 Table S3). The formalized procedure was as follows: Let L i = (l 1i , ... l mi ), L j = (l 1j , ... l ni ) be the vectors representing two binding proteins i and j; i, j [1, ... 18], m < n are dimensions (sizes) of the vectors. We remove the redundant regions from L 1 , L 2 which bind the same region from the counterpart vector, thereby obtaining the reduced sizes m', n' of the corresponding vectors. We define regions l fi and l hj overlap if they possess nonzero common location on DNA. Then we define the similarity rate as r ij = k min(n , m ) , where k is the number of overlapping regions, and consequently compile similarity matrix R = {r ij }. Then we apply multidimensional scaling (MDS) with XLStat add-on software http://www.xlstat.com for the matrix R obtained as described at the previous step. We used non-metric MDS model, where only the order of the similarities counts (ordinal (2)). Agglomerative hierarchical clustering (AHC) with the same metric as in MDS was used to assess non-random clusters in the pair-wise comparisons (XLSTAT Inc).
To evaluate the significance of protein binding sites co-localization, we used chi-square test for 2 × 2 contingency table as follows. We considered the number of non-overlapping fragments with average length about 3 kb in 120 Mb of the eukaryotic part of D. melanogaster genome, so we obtained n = 40000 fragments in total. Next, for each pair of proteins we calculated the contingency table, where l and mnumbers of peaks with positive scores for the proteins in a pair. The expected (theoretical) number of overlapping sites given random overlap model calculated for two proteins is en = l*m/n. This model is robust to the variance of the total segments in the interval [40000-80000] with significance increasing with increasing total segments model. Thus, we used the value of 40000 as a conservative estimate.

Statistical analysis
To assess whether protein binding sites preferentially localize to the experimentally confirmed 13 interbands at a statistically significant level, we performed 13000 random samplings of equivalent DNA chunks (4 and 10 kb segments) across D. melanogaster genome and calculated the number of corresponding protein binding sites that overlapped with the random regions. The sampling procedure accounts for the observed biases in chromosome localization of the 13 validated interbands (one on chr2R; 3 on chr3L, 4 on chr3R; and 5 on chrX) and uses corresponding weights when selecting random fragments from a chromosome arm. Only binding regions with positive scores were considered. No limitation on the size of a binding site has been imposed. Only single hits per random region were considered. We then calculated the probability of getting a random DNA region of a given size equivalent to the source set (4/10 kb). Thus, we were able to estimate how many of the 13 randomly chosen fragments shall overlap with the given protein binding sites by chance. Also we calculated P-value of the observed overlap of the experimentally verified 13 interband regions with the given sets of protein binding sites using Binomial test as follows: where p -expected by random chance frequency of a given set of protein binding sites to overlap with the DNA region of a given size (4/10 kb), m -number of the observed DNA regions that overlap with the given protein binding site set, C i 13 is a binomial coefficient. The tail of the binomial distribution to be summed up was chosen based on the observed number of "successes" m, which could be either less or more than 13*p. In the case m >13*p, we set P' = 1-P, otherwise the original P was used.

Additional material
Additional file 1: Figure S1. Localization of proteins and DNA elements around 13 interband regions of cell lines chromosomes. Top: molecular and genetic maps (20 kb) of these regions are centered at positions (solid vertical lines) of reference transposons (triangles) that were used for cytological identification and cloning the DNA around reference transposons in interbands. Exact molecular coordinates of transposon insertions are given in Additional File 2 Table S1. Horizontal arrows denote positions and orientation of known genes (FlyBase Genes r. 5.12). Vertical red arrows correspond to P-transposon integration sites referenced in FlyBase (when insertion sites were too close, their number is indicated above the arrow). For the region 3C6/C7, P-element integration regions lacking precise molecular localization are denoted by horizontal lines; fa swb deletion is shown as square brackets. Bottom: data on the densities of nucleosomes, distributions of 9 chromatin states and binding sites for chromatin proteins in S2 cell line as presented on the modENCODE website [48], as well as distributions of histone H1 depleted regions (H1-dips) according to [40], the five-colored chromatin types [35] and binding sites for ORC proteins according to [52] in Kc167 cell line. Regions most likely corresponding to interbands are delimited by vertical dashed lines.
Additional file 2: Tables S1-S8. Supplemental Tables 1-8. Table S1 Molecular coordinates of integration sites of P-transgenes used to map interbands. Additional file 3: Figure S2. Frequency of chromatin states in 13 regions of D. melanogaster genome that correspond to interbands in polytene chromosomes. A -5 "colored" chromatin states according to [35]; B -9 chromatin types according to [36]. Sizes of DNA segments centered at the insertion sites of reference P-transposons (X axis); Percentage of DNA fragments associated with a particular type of chromatin calculated for each segment (Y axis).