A sequence motif enriched in regions bound by the Drosophila dosage compensation complex
© Gallach et al; licensee BioMed Central Ltd. 2010
Received: 16 September 2009
Accepted: 12 March 2010
Published: 12 March 2010
In Drosophila melanogaster, dosage compensation is mediated by the action of the dosage compensation complex (DCC). How the DCC recognizes the fly X chromosome is still poorly understood. Characteristic sequence signatures at all DCC binding sites have not hitherto been found.
In this study, we compare the known binding sites of the DCC with oligonucleotide profiles that measure the specificity of the sequences of the D. melanogaster X chromosome. We show that the X chromosome regions bound by the DCC are enriched for a particular type of short, repetitive sequences. Their distribution suggests that these sequences contribute to chromosome recognition, the generation of DCC binding sites and/or the local spreading of the complex. Comparative data indicate that the same sequences may be involved in dosage compensation in other Drosophila species.
These results offer an explanation for the wild-type binding of the DCC along the Drosophila X chromosome, contribute to delineate the forces leading to the establishment of dosage compensation and suggest new experimental approaches to understand the precise biochemical features of the dosage compensation system.
In Drosophila, dosage compensation occurs by hypertranscription of the genes of the single X chromosome in males, leading to a level of expression similar to that found for the copies of those same genes located on the two female X chromosomes [1–3]. This hypertranscription is controlled by a ribonucleoprotein complex, known as Dosage Compensation Complex (DCC; a. k. a. MSL complex, compensasome), which includes at least five proteins, encoded by the male-specific lethal (MSL) genes [4–8], and two non-coding RNAs, derived from the roX1 and roX2 genes [9–11]. The DCC, functional only in males, modifies the chromatin structure of the X chromosome by altering its pattern of histone acetylation [12, 13]. Immunostaining with antibodies against MSL proteins demonstrated that the DCC complex specifically recognizes hundreds of sites along the male X chromosome [14–20]. How this specificity is achieved is still poorly understood. Data obtained by hybridizing chromatin immunoprecipitates obtained from regions bound by the DCC complex to genomic tiling arrays (ChIP-chip) have led to the precise characterization of many binding sites ([21, 22]; see also ref. ). So far, however, a common sequence motif shared by those regions has not been described. Just a slight enrichment for some very short sequences has been detected in those experiments, as well as with related techniques [21–25]. Partial DCCs are still generated in flies that are mutant for some of the genes encoding proteins of the complex. These incomplete DCCs are also able to recognize the X chromosome, but only in a limited number of places, first described by cytological analyses . These places were suggested to be "entry sites", high-affinity binding sites from which the DCC would epigenetically spread to the rest of the chromosome . Later, convincing evidence was obtained against generalized spreading from the few cytologically characterized sites as the only determinant for the wild-type pattern of DCC binding [27, 28]. Recent data support however that the complex indeed has a higher affinity for those sites than for the rest of the X chromosome  and that these sites are more abundant than the cytological data suggested [29, 30]. Therefore, they may contribute to spreading at a local scale [30–32]. A sequence motif, containing several GA/TC dinucleotides, has been found to be enriched in high affinity binding sites of the DCC ([29, 30]; we will refer to this motif throughout the text as [GA/TC]n).
All these results are compatible with a genetic model in which the X chromosome is enriched for two different types of sequences. One kind of sequences, characterized by the (GA/TC)n motif, would be required for the complex to bind the high affinity sites. The other type, which is yet to be characterized, would be needed to achieve wild-type pattern of DCC binding. However, the characterization of epigenetic marks, often associated to gene transcription, which contribute to DCC binding, and the discovery of acquisition of DCC binding by some transcribed autosomal genes transposed to the X chromosome, has led to an alternative model: the DCCs would spread from the high affinity sites largely or even exclusively by following epigenetic signals [31, 33–36]. These models are not in contradiction and formulating a combined model is possible (summarized in refs. [37, 38]). On one hand, the Drosophila X chromosome may contain a large number of sites, with characteristic sequences able to attract the DCC with variable strengths. On the other hand, transcriptional status and epigenetic signals may influence the binding and/or the spreading of the complex. Obviously, this hybrid model could be possible only if a certain type of sequences is found to be enriched at all DCC binding sites and not only at the high affinity ones.
We recently developed a new type of DNA sequence analysis, called oligonucleotide profiling, which allows for the rapid detection of singular features in chromosomes [39, 40]. Using this method, we showed that the X chromosomes of drosophilid species are less complex than the autosomes and that such a pattern correlates with the acquisition of dosage compensation by neo-X chromosomes . These results suggested that simple sequences may be linked to dosage compensation and that oligonucleotide profiling may be used to detect X chromosome-specific sequences involved in the recognition of that chromosome by the DCC. In this paper, we demonstrate that indeed our method allows for the detection of a peculiar type of sequences, hitherto uncharacterized, which is enriched at the DCC binding regions. We define a repetitive sequence motif that may be involved in X-chromosome detection or in spreading of the DCC along the X chromosome of Drosophila species.
Sequences, sequence analyses and genomic data representation
D. melanogaster chromosomes (Release 4.3) were obtained from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/). Accession numbers were as follows: X: NC_004354.2; 2L: NT_033779.3; 2R: NT_033778.2; 3L: NT_037436.2; 3R: NT_033777.2. All Blast analyses were also performed online at the corresponding NCBI web page http://blast.ncbi.nlm.nih.gov/Blast.cgi. Figures containing Drosophila melanogaster genomic regions were drawn with GBrowse .
Definition of DCC binding regions
For our analyses comparing the DCC binding sites with X chromosome-specific sequences, we used the results obtained by Gilfillan et al. (; kindly provided by the authors) as primary data. These data refer to punctual, quantitative signal values in ChIP-chip experiments for multiple probes tested along the X chromosome. We wanted instead to define qualitative DCC binding regions which could be compared with X-specific regions. To do so, we needed to define a minimum signal level and also when two adjacent positive probes could be considered part of the same binding region. After some tests with Gilfillan et al.  data, we decided to choose a minimum value of signal equal to 2 (i. e., a stringent condition, well above background levels) and a maximum distance between consecutive significant values of 1 Kb to be considered part of the same region. This maximum distance was chosen considering that consecutive oligonucleotides in the tiling arrays are separated in the genome by 50 - 100 nucleotides and that repeats are excluded, leading to significant gaps [21, 22].
Oligonucleotide profiling and definition of X-specific regions
We searched along the X chromosome for regions containing X-specific sequences using oligonucleotide profiling, implemented in our program UVWORD [39, 40]. The UVWORD algorithm is simple: the program first reads a selected sequence (source sequence), one nucleotide at a time, and establishes the frequencies of all oligonucleotides of size k nucleotides present in that sequence. Then, it reads a second DNA sequence (target sequence), again one nucleotide at a time, and associates each DNA word present in the target sequence with the frequency in the source sequence. By using a single target sequence and two source sequences, relative ratios of frequencies of oligonucleotides in the source sequences can be obtained [39, 40]. Here, given that we wanted to establish X-specific regions, the target sequence was the X chromosome and the two source sequences were the X chromosome itself and an autosome, the 2L chromosome arm. In Gallach et al. , we demonstrated that all D. melanogaster major autosomal arms have essentially identical oligonucleotide compositions. Therefore, it is only necessary to use one autosome for comparing with the X chromosome.
The characterization of X-specific regions was therefore performed as follows: for each word of the X chromosome (target), its frequency was determined on the X chromosome (source 1) and on the 2L arm (source 2) and these values were then combined to obtain an X/2L ratio (for words present on the X, but absent on 2L, we fixed a 2L value = 0.5). This X/2L ratio was then corrected for the relative sizes of the X and 2L chromosomes. The final corrected X/2L ratio provides a measure of X-specificity for each word on the X chromosome . However, here our interest was not to establish how X-specific particular words were, but to detect X-specific regions which could be compared to DCC binding sites. Therefore, we decided to average the values of a certain number R of adjacent words of size k to obtain smoothed profiles of X-specificity. In this study, we used the parameters k = 13 and R = 5. The reasons for choosing words of 13 nucleotides, which are optimal for Drosophila chromosome analyses, are described in Gallach et al. . We chose R = 5 in order to allow the analysis of small X-specific regions. In summary, the two parameters, k = 13 and R = 5, define loci consisting on 5 adjacent words of size k = 13 of the X chromosome (total size of a locus = 17 nucleotides) for which average X/2L ratios are calculated. Notice that, with each nucleotide that the program reads in the target sequence, a new locus of 17 nucleotides is established, which overlaps k + R - 2 = 16 nucleotides with the previous one, and a new average is calculated.
After obtaining the average X/2L ratios, the next step was to establish when a region of the X (i.e., a single locus, or oftentimes, a set of adjacent, overlapping loci) was significantly X-specific. We explored what the effect was of using different threshold values of the average X/2L ratios with respect to their ability to explain DCC binding regions. As shown in the Results section, the best results were obtained with an average X/2L ratio of 6.8 (i. e., when the region contained oligonucleotides that were present on average at least 6.8 times more often on the X than on the 2L arm). Therefore, we eliminated all regions below the 6.8 cutoff and retained the rest for comparison with the DCC binding regions.
Comparison of DCC binding regions and X-specific regions and characterization of motifs
Once the motif enriched in the DCC binding sites based on Gilfillan et al.  data was characterized, we decided to map it not only against data obtained from that paper, but also against a consensus dataset of DCC binding regions obtained by combining data from Gilfillan et al.  and Alekseyenko et al. . This consensus dataset was obtained as follows: for the three datasets from Alekseyenko et al. , corresponding to SL2 cells, clone 8 cells and embryos, we characterized regions containing probes with signal values above or equal to 2 in each of the three experiments and separated by at the most 1 Kb (i. e., the same criteria used for the Gilfillan et al.  dataset in the previous analyses). These regions were then compared to those obtained previously from the Gilfillan et al.  dataset. When the regions derived from Alekseyenko et al.  and Gilfillan et al.  results were separated by less than 1 Kb, they were merged in order to generate the final consensus regions. Given that this way of obtaining the consensus regions led in some cases to the appearance of small regions detected in single experiments, data were filtered, keeping only regions that were longer than 1000 nucleotides.
We checked for the presence of motifs in the autosomal regions shown to be bound by the DCC complex by Gorchakov et al.  using similar methods. As above, only the regions with signal values above 2 were considered positive in our analyses of the TrojanElephant transposon. The TrojanHorse sequences were kindly provided by the authors.
Evolutionary conservation analyses
To determine whether the motifs were evolutionary conserved, the sequence corresponding to each motif plus 100 nucleotides upstream and downstream of it were selected as queries to perform BLASTN searches against D. virilis genomic sequences (included in the NCBI wgs database). For those sequences for which non-ambiguous homologies were detected (minimal E-value = 10-3, minimal length of homology = 40 nucleotides), we evaluated the percentage of nucleotide identity both within and around the motif. All the sequences for which we found homology in D. virilis were reexamined in the D. melanogaster genome to determine whether they corresponded to coding regions. The codons that corresponded to the conserved motif sequences included in coding regions were characterized using TBLASTX analyses.
Coding region analyses
We downloaded the files containing the coding regions of the D. melanogaster chromosomes (again, from the 4.3 genome release, obtained from Flybase ) and established the frequency of each type of codon. We also determined to which codons the motifs located in coding regions of the X chromosome corresponded, either within or outside the DCC binding regions. Finally, we established the relative position of the motifs along the genes by following methods similar to those described in . Briefly, we obtained from the supplementary material of that paper a list of X chromosome genes with intense DCC binding (average signal > 0.5). From them, we selected those which were larger in size than 2000 base pairs and which also contained at least one motif. For the 421 genes with those features, we mapped all motifs present as follows: the central nucleotide of the motif was found on the X chromosome and a value measuring the relative position along the gene was assigned to that nucleotide (value = [position of the nucleotide on the X chromosome - position of the first (5') nucleotide of the gene on the X chromosome]/total length of the gene). We finally used a Kolmogorov-Smirnov test to establish whether the distribution of motifs was uniform throughout the sequences of the set of genes.
Analyses of high affinity binding sites
As explained in detail in the Results section, we also established the density of our motif in the high affinity binding sites characterized by Alekseyenko et al.  and Straub et al. . In parallel, we used MEME to reanalyze the data obtained in those two studies in order to establish whether there was any sequence relationship among the motifs detected in the high affinity sites and our motif.
Sliding-window analyses of motif locations
We determined the nucleotide located at the center of all DCC binding regions (from Gilfillan et al.  data), in all the HAS (from Straub et al.  data), and in each region of the X where the DCC does not bind (i. e. the regions that are left along the X chromosome once we eliminate from it the DCC binding regions). We also randomly selected 1000 nucleotides from each large autosomal arm (for a total of 4000 randomly selected points). The central nucleotides of DCC binding regions, non-binding regions, HAS and the randomly taken nucleotides of the autosomes were taken as starting points for sliding-window analyses of motif densities. Those analyses were performed as follows: two 500-nucleotide long windows were initiated at those starting points and then displaced, respectively upstream and downstream from them, one nucleotide at a time. Then, for each window and position, its density of motifs (no. motifs/Mb) was determined. In this way, we obtained a precise characterization of how the density of motifs varies with increasing distances from the starting points.
Primary definition of DCC binding regions, X-specific regions and determination of maximal congruence between both types of regions
Results used in the selection of the cutoff value for X-specificity. In bold, data for the chosen value.
X/2L cutoff (%) a
X regions b
X-positive regions (%) c
DCC positive regions (%) d
Characterization of an X-enriched motif found in DCC binding regions
The X-specific regions included in DCC binding regions (2157, after eliminating duplicates) were analyzed with five motif-recognition programs to determine whether they had anything in common. Notably, each of these five programs indeed detected that many regions had common motifs, although the number of regions detected was quite variable (GLAM2: 1057 regions; MOTIFSAMPLER: 916 regions; MEME: 712 regions; WEEDER: 490 regions; ALIGNACE: 370 regions). We checked for congruence among these results, looking for regions that were detected as significant in at least four of the five programs. This led to the characterization of 329 different sequences. Given that some of them were present more than once in our original dataset, they corresponded to 356 different places along the X chromosome. The expected number of sequences, if the results generated by the programs were independent, was just 102. This difference demonstrated that the five programs often detected the same sequences. Finally, we used MEME to analyze the 329 different sequences obtained, establishing that all of them truly contained a single 12-nucleotide long motif. According to MEME, the probability of such a motif arising by chance so many times in that sample was 10-523. In related analyses, we found that X-specific regions not included in the DCC binding regions are just slightly enriched in simple DNA sequences, mainly poly-G/C . These results demonstrate that there is a common motif in at least 356 places of the X-chromosome that are both highly X-specific (X/2L ratio ≥ 6.8) and where the DCC binds. The motif obtained is shown in Figure 1. It has an obvious repetitive signature: [G(CG)N/N(CG)C]4. From now on, we will refer to this motif as [G(CG)N]4.
Chromosomal distribution of sequences related to the motif
In order to determine all the positions in which sequences related to the [G(CG)N]4 motif were present, we used the Prophecy and Profit programs of the EMBOSS suite. We established the maximum score (= sum of the frequencies of the most frequent nucleotides) for the matrix obtained for our motif from the 329 sequences characterized (Figure 1), and then we determined all the sequences with a score of at least 90% of that maximum. We then scanned the X chromosome for the positions of all the sequences with score values above the 90% cutoff. This led to the characterization of 14967 sites. These sites had an average size of 14.4 nucleotides, slightly larger than the original 12 nucleotides-long motif. This was caused by the fact that the motif is internally repetitive, so positive overlapping sequences were often detected and merged.
We found that 3082 of the 14967 sites were included in 449 of the 559 DCC binding regions defined from Gilfillan et al.  data, with an average of 6.9 sites/DCC region. It is significant that, although we detected sites in just 80% of the DCC binding regions, the ones that remained undetected were in general very small. The total length of the regions lacking motifs was just 6% of the total assigned to DCC binding regions. We then divided the X chromosome into the 559 DDC binding regions and the rest, in order to test whether the sites detected were found at a higher density in those regions. For the rest of the X chromosome, once the DCC binding regions were excluded, we determined an average of 602 ± 24 sites/Mb, a value almost identical to the one that we found for the autosomes: 590 ± 29 sites/Mb (all large autosomal arms considered). However, the 449 positive regions had a density of 1582 ± 68 sites/Mb, that is, almost three times higher. Logically, this leads to these sites being much closer in the DCC binding regions than in the rest of the chromosomes (Additional File 2). The median distance between consecutive sites was just 215 bp and 87% of the consecutive sites were separated by less than 1 Kb. These results demonstrate that most DCC binding regions contain multiple sites related to the [G(CG)N]4 motif that we detected in our original searches, that those sites are relatively close to each other and that they are significantly enriched in most DCC binding regions, with respect to both the rest of the X chromosome and the autosomes.
To obtain further support for these conclusions, we repeated these analyses with a consensus dataset, generated combining data from Gilfillan et al. (; 1 dataset) and Alekseyenko et al. (; 3 datasets). From the combined data, we characterized 666 DCC binding regions. These are quite more than in our analysis based on a single dataset (666 vs. 559), which suggests that the four datasets were somewhat heterogeneous. Using again the 90% of maximum value of the similarity matrix as a cutoff, sequences related to our motif were found in 79% (526/666) of these consensus DCC binding regions. As previously seen, the DCC binding regions in which the motif was not detected were small, corresponding all together to just 4% of all the nucleotides defined as bound by the DCC in the consensus dataset. The density of motifs was 1201 ± 39 sites/Mb in the positive DCC binding regions in this consensus dataset. The density in the rest of the X chromosome went down to 523 ± 25 sites/Mb in this case, thus being a bit lower than the average 590 sites/Mb detected in the autosomes. These results confirmed the significant enrichment of the motif in the DCC binding regions.
We also checked for the presence of our motifs in the TrojanHorse and TrojanElephant transposons used by Gorchakov et al.  to demonstrate the ability of some autosomal sequences to acquire de novo DCC binding. Significantly, we found that the region in TrojanHorse to which the DCC binds when the transposon is inserted on the X chromosome (corresponding to positions 4224822-4228454 in chromosome 2L; genes CG3702 and Rpl40) has a high motif density (4 sites in 3633 nucleotides, or 1101 sites/Mb) while the rest of the transposon has a low density (9 sites in 19531 nucleotides; 461 sites/Mb). Similarly, the regions bound by the DCC in the TrojanElephant transposon (signal > 2; corresponding to positions 6908636-6914771 and 6949036-6955236 in chromosome 2L) have a density of 811 sites/Mb (10 motifs in 12337 nucleotides) while the rest of the transposon has a density of motifs of just 623 sites/Mb. In summary, the density of sites in the regions bound by the DCC when these transposons are localized on the X chromosome is much higher than the average density of sites in the autosomes (590 sites/Mb). This may contribute to them behaving as typical X chromosome sequences when inserted on the X.
Evolutionary conservation of the sites located in DCC regions
The DCC is acting not only in D. melanogaster, but also in other drosophilids, including flies of genera other than Drosophila . Therefore, we decided to explore whether the sites found were evolutionary conserved, which would indicate that the same sequences might be used in different species. Given that the sites are very small, we decided to locate each site in D. melanogaster and then add at both sides of the sites a total of 100 extra nucleotides. By doing so, we increased the likelihood of finding the homologous site in a different species. The chosen species to compare was D. virilis, which is a distant relative of D. melanogaster, belonging to a different subgenus and which genome has been fully sequenced . The two lineages split about 63 millions of years ago .
Of the 3082 D. melanogaster sites tested, we were able to characterize the homologous D. virilis sites in 894 cases. The total number of aligned nucleotides was 113225, of which 12214 corresponded to the sites and the rest to the adjacent sequences added to perform the analyses. Since we were interested in determining whether the sites were evolutionary conserved, we counted the number of differences in both the sites and the adjacent sequences, which was used as a control of local conservation in D. virilis. In total, there were 16365 nucleotide differences between D. melanogaster and D. virilis, and 1448 of them affected the sites. This means that the percentage of nucleotide identity between the sequences of the two species was 88.1% for the sites and just 85.2% for the adjacent sequences. We used a cumulative hypergeometric distribution to establish the probability of the sites and adjacent sequences having the same mutation rate. That probability was 5.0 10-19. We conclude that, in the cases in which it has been possible to establish homology between D. melanogaster and D. virilis, the sites are significantly more conserved than their immediately adjacent sequences.
Frequency of appearance in the conserved regions analyzed in D. melanogaster and D. virilis of the six possible codon classes that can be obtained from the repetitive [G(CG)N]4 signature
Amino acids encoded
Val, Ala, Glu, Gly, Leu, Pro, Gln, Arg
Leu x 2, Pro x 2, Arg x 2, His, Gln
Ser x 2, Cys, Pro, Arg, Thr, Ala, Gly
Arg x 3, Gly x 2, Cys, Trp, Ser
Ala x 4, Gly x 4
Pro x 4, Ala x 4
Coding regions analyses
The results detailed in the previous section suggest a preference for (CG)NG codons to be included in the motifs found in the coding regions bound by the DCC. However, those results were constrained to the 894 cases for which we detected homology with D. virilis. Therefore, we decided to establish whether that was a general preference, for all the D. melanogaster sites. We found that a total of 2720 of the 3082 sites detected in our previous analyses (88%) were included in coding regions. This is in agreement with the known enrichment of the DCC binding regions in coding sequences [21, 22]. Then, we determined that those 2720 sites corresponded to 10701 codons (incomplete codons were discarded). Just as it occurred with the conserved sequences described above, the most frequent triplet was again (CG)NG (6185 codons; 58%). When we performed the same analyses for the motifs not included in the DCC binding regions, we found that 5767 of the 11885 motifs (49%) were included in coding regions. We established that 24016 codons were derived from those 5767 motifs. Notably, the proportion of (CG)NG triplets was quite smaller than in DCC regions, just at 51%. This difference is statistically highly significant (p = 1.2 10-41; Chi-square test with 1 degree of freedom). Therefore, the DCC binding regions are enriched in motifs that correspond to (CG)NG codons.
Interestingly, seven of the eight amino acids that the (CG)NG sequences can generate are among the preferred ones in D. melanogaster and many other Drosophila species . It is known that the D. melanogaster X chromosome has an increased codon bias respect to the autosomes . Therefore, it is possible that selective pressure to keep DCC binding/spreading sites with [(CG)NG]n signatures contribute, along with other processes, to that X-specific bias. On the other hand, it is important to demonstrate that the opposite is not true (i. e. that our results are not simply caused by the difference in codon bias between the X chromosome and the autosomes). Given that the DCC binding regions and coding regions are often coincidental, a very strong codon bias, or indeed any force causing highly biased frequencies of particular codons on the X respect to the autosomes, could often lead to recovering X-specific codons when using the oligonucleotide profile method to detect X-specific sequences. However, we have found that the frequencies of all codons are actually so similar in the X chromosome and the autosomes that they cannot explain the 6.8× level of enrichment for sequences with k = 13 that we have used as cutoff in our searches. When we analyzed the coding regions of X chromosome and autosomes, we found that the codon GGC was the one with the highest X/A relative ratio, which had a value of 1.18. Therefore, even a sequence formed by six consecutive GGC codons (remember that 19.6 nucleotides is the average sequence size selected by our method, see above) would be enriched just 1.186 = 2.7 times due to its increased presence in the coding regions of the X chromosome respect to the coding regions of the autosomes. It can be deduced that, unless those same sequences are (obviously by reasons totally independent to codon bias or any other forces acting on coding regions) also depleted from the non-coding regions of the autosomes relative to the non-coding regions of the X chromosome, they would never achieve the 6.8× cutoff value required to be detected in our searches. As a relevant example to ascertain this point, we established the relative frequencies of [(CG)NG]4 sequences in X chromosome and in autosome genes. These frequencies were 5.5 10-3 (i. e. just 5.5 out of 1000 groups of four consecutive codons had a [(CG)NG]4 signature) and 3.7 10-3 respectively, with an X/A ratio of just 1.5. We conclude that neither codon bias nor any other force acting solely on coding regions can explain the selection of these sequences by the oligonucleotide profile searches.
Given that it has been described that the DCC binding regions are preferentially located at the 3' ends of the genes, we explored whether the motifs were similarly distributed (see Methods). We found a depletion of motifs at both extremes of the genes, spanning about the first 5% and the last 5% of their sequences. However, along the rest of the sequences of the genes, the distribution of motifs was homogeneous, according to a Kolmogorov-Smirnov test for departures of the uniform distribution (p = 0.09; n. s.). Thus, the motifs by themselves cannot explain the preference of the DCC to bind the 3' ends of the genes.
Analyses of DCC high-affinity sites
A relevant point was to determine whether the [G(CG)N]4 motif found was in some way involved in the DCC high-affinity sites (HAS). Alekseyenko et al.  recently characterized a motif, which they called "MSL recognition element" or MRE. This motif was deduced from 150 binding sites detected for the incomplete DCC complex formed in mutant msl-3 flies. These 150 sites were considered HAS by those authors. As we already mentioned, this 21 bp motif is characterized by containing (GA/TC)n repeats and is totally unrelated to the one that we found. However, using MEME, we detected three other motifs enriched in the putative HAS characterized by Alekseyenko et al. (; kindly provided by the authors). Most interestingly, one of the motifs, found a total of 166 times in 79 of their 150 HAS, was clearly related with the motif detected in the searches described above. This motif detected by MEME was 21 base-pairs long and had a consensus sequence which can be simplified to [G(CG)(AT)]7. This is obviously very similar to the [G(CG)N]4 signature detected in this study. In fact, when we searched for our 12 bp-long motif in the 150 HAS (again using a 90% cutoff for the similarity matrix, as above), we found similar data: 71 HAS contained a total of 138 copies of the motif. Density in the positive HAS was 1004 sites/Mb, slightly lower than that found for the whole set of DCC binding regions, but still higher than that found in the rest of the X chromosome or the autosomes.
In another analysis, Straub et al.  also searched for HAS, characterized this time by a combination of experiments in cell lines, using either RNA interference of genes encoding DCC proteins (msl-3, mle, mof) or decrease of crosslinking in the ChIP experiments. They obtained a total of 132 putative HAS, from which they characterized a 29 bp motif with (GA/TC)n repeats, clearly related to the one found by Alekseyenko et al. . Our own reanalysis of those 132 putative HAS using MEME led to the characterization of a total of five motifs, of which the most abundant was roughly equivalent to, although slightly shorter than, the one described by Straub et al.  (data not shown). None of those five motifs was clearly related to the one we characterized. However, when we searched for our [G(CG)N]4 motif in the 132 putative HAS characterized by Straub et al. , we found that 56 of them actually contain at least one, making a total of 109 motifs. This corresponds to a density in the positive HAS of 1354 motifs/Mb, similar to that found for all the DCC binding sites together and again higher than that found for the rest of the X chromosome and the autosomes (see above). We can conclude from our analyses of Alekseyenko et al.  and Straub et al.  data that the [G(CG)N]4 motif that we have detected may be contributing to some of the high affinity sites.
We have characterized a repetitive sequence motif, [G(CG)N]4, which is specifically enriched in regions bound by the Drosophila DCC. Significantly, hexanucleotides related to the motif that we have described had been detected already as enriched in DCC binding regions . Our ability to detect this longer motif critically depended on a new type of approach, which involved not only taking into account the similarities among DCC binding regions but also the local X-specificity of the sequences within those regions. In this sense, our application of the oligonucleotide profiling strategy, which previously demonstrated its power to tackle more general problems of chromosome differentiation [39, 40], has been fruitful. Further applications of this approach to related problems can be easily envisaged.
The motif is especially enriched in the DCC binding regions that do not have a high affinity for the complex (i. e. those that are not part of the HAS determined so far), while is less frequent in the HAS, in which, moreover, the motifs mostly appear in peripheral positions (see statistical results and Figure 4). This distribution suggests a significant role in the generation of most "low affinity" DCC binding regions, including those that surround the HAS. Its significance within the HAS is less obvious; it may simply contribute to them in some cases. This is demonstrated by the facts that many HAS lack motifs, while the motif tends to be at high densities adjacent to the HAS (see figures 2, 3 and 4 and numerical results above). It is significant in this context that the motifs found here may also contribute to explain recent results showing acquisition of DCC binding sites by some short, autosome-derived regions transposed to the X chromosome . The failure of acquiring DCC binding by larger regions of autosomal origin  may be simply understood considering that the larger the region, the closer its density of motifs will be to the autosomal average, which may be too low to support binding. Thus, our hypothesis to explain Gorchakov et al. data  is that only particular, short, motif-rich regions of autosomal origin may acquire dosage compensation when moved to the X chromosome. The active transcription of genes located in those regions seems also essential, given that a deletion in TrojanHorse that abolishes DCC binding  does not eliminate any [G(CG)N]4 motif.
With these results in mind, we hypothesize that the [G(CG)N]4 sequences may contribute to the recognition of the X chromosome by the DCC, to generate the majority of DCC binding regions observed, and/or to the spreading of the complex from the HAS to the rest of the X chromosome. Our results are in agreement with the combined model that we discussed in the Introduction, according to which there are multiple binding/spreading sites with different features and strengths along the X chromosome. The available data is compatible with low affinity sites being largely explained by the presence of multiple, closely located [G(CG)N]4 sequences, while high affinity sites may be due to a specific combination of sequence motifs among which the (GA/TC)n motif would be fundamental and the [G(CG)N]4 motif and probably several others (e. g. those additional kinds of sequences detected in our MEME searches as enriched at HAS) of secondary importance. Our data thus indicate that the high affinity sites indeed are qualitatively different from the rest in terms of which sequences are involved in their formation, as already suggested by Straub et al.  and Alekseyenko et al.  data.
The fact that some small DCC binding regions (which add up to 4 - 6% of all nucleotides bound by the DCC) do not contain any obvious motif, can be explained in different ways. First, it is evident that our analyses do not exclude that other motifs, so far uncharacterized, may be significant in DCC binding regions. Our searches have been quite stringent and focused on motifs of a certain length (in the range of 10 - 20 nucleotides, given the k and R parameters used), so some regularities may have been missed. Alternatively, it is possible that the criterion used to search for motif sequences (90% of the maximum score of the frequency matrix) is too strict. Finally, these exceptional regions may be simply caused by experimental limitations (e. g., false positives for DCC binding).
Despite that it was often impossible to establish whether particular [G(CG)N]4 motifs within DCC binding regions were evolutionary conserved, we were able to detect 894 sites in D. virilis. We found that 87% (781/894) of them were included in coding regions. The higher level of conservation of the [G(CG)N]4 motif with respect to the immediately adjacent sequences detected in those 781 cases is significant. It may be interpreted as [G(CG)N]4 sequences being part of the X-recognition/DCC spreading system in multiple Drosophila genus species, a result that is in agreement with the known conservation of DCC binding in drosophilids . In Gallach et al. , we described the most frequent trinucleotides in different Drosophila species. Interestingly, one of them was (CAG/CTG)n, which corresponds to one of the possible trinucleotides defined by the [G(CG)N]4 motif. This trinucleotide was enriched on the X chromosomes of all Drosophila species tested, including the neo-X arm of D. pseudoobscura . These results, together with those found for the high affinity sites, enriched for (GA/TC)n sequences, offer support to the old idea that different types of simple DNA along the Drosophila X chromosome may be key to the process of dosage compensation (see discussions in refs. [39, 63–65]). It is interesting that Bachtrog  recently characterized positive selection acting on three HAS in the D. melanogaster lineage, a result that correlates with positive selection of MSL proteins in that same lineage [67, 68]. These results, which contrast with the evolutionary conservation that we have observed for our motif in the D. melanogaster/D. virilis comparisons, suggest that HAS may be, at least in particular lineages, under peculiar selective regimes, different from the rest of DCC binding regions, as suggested by Bachtrog .
When we established the imprint that the [G(CG)N]4 motif may leave on the protein sequences of Drosophila genes, we found that the preferred frame was the one that allowed for the maximum variation in the amino acids encoded by the nucleotides of the motif. This result may contribute to explain the apparent paradox of why the DCC often binds to coding regions of the genes [21, 22]. The adjacent non-coding regions, evolutionary less constrained, would seem better targets to acquire binding sites for the DCC (see ref.  for a discussion of the process of acquisition of dosage compensation). However, simple, degenerate, repetitive motifs have a minimal impact on coding regions if they are able to encode for many different amino acids. Then, it may be often enough to choose particular codons to generate the DNA motif without changing the protein sequence. Only if the motif encodes for just a few amino acids, its generation would necessarily cause a repeat of those amino acids in the protein, which may be more difficult to accommodate. In the case of the motif that we discovered, the most versatile frame, and the one that is found most often, is [(CG)NG]n, which can be converted into eight different amino acids (Table 2). As we indicated above, the acquisition of these sequences may contribute to codon bias, which is especially significant in the X chromosome .
The analysis of the Chip-ChIP data for DCC binding with a novel strategy based on oligonucleotide profiling allowed us to detect a set of DNA sequences that are at the same time X-specific and included in the DCC binding regions. These sequences share a common short, internally repetitive motif which may contribute to the establishment of the wild-type localization of the dosage compensation regulators along the X chromosome. Further experimental confirmation for the role of the motif detected in DCC binding and/or local spreading is required. These results open fascinating new possibilities. Among them, the clearest is that it is now possible to devise experiments to test more precisely the factors governing the binding or spreading of the DCC complex. Also, the relationships among genetic changes and epigenetic modifications leading to the wild-type pattern of DCC binding may be explored in more detail. Finally, we also may more deeply understand the similarities and differences of the evolution of the dosage compensation systems of Drosophila, Caenorhabditis, in which a similar situation of simple, short motifs acting as X chromosome recognition/spreading marks has been described [69–71], and mammals, in which several types of repetitive DNA sequences may influence X chromosome inactivation [72, 73].
The authors would like to thank Esther Betrán and Peter Becker for their comments on an early version of this paper and Jennifer Gage for proofreading the manuscript. This work was supported by Ministerio de Ciencia e Innovación, Spain [grant number BIO2008-05067].
- Mukherjee AS, Beermann W: Synthesis of ribonucleic acid by the X-chromosomes of Drosophila melanogaster and the problem of dosage compensation. Nature. 1965, 207: 785-786. 10.1038/207785a0.PubMedView ArticleGoogle Scholar
- Hamada FN, Park PJ, Gordadze PR, Kuroda MI: Global regulation of X chromosomal genes by the MSL complex in Drosophila melanogaster. Genes Dev. 2005, 19: 2289-2294. 10.1101/gad.1343705.PubMed CentralPubMedView ArticleGoogle Scholar
- Straub T, Gilfillan GD, Maier VK, Becker PB: The Drosophila MSL complex activates the transcription of target genes. Genes Dev. 2005, 19: 2284-2288. 10.1101/gad.1343105.PubMed CentralPubMedView ArticleGoogle Scholar
- Belote JM, Lucchesi JC: Control of X chromosome transcription by the maleless gene in Drosophila. Nature. 1980, 285: 573-575. 10.1038/285573a0.PubMedView ArticleGoogle Scholar
- Belote JM, Lucchesi JC: Male-specific lethal mutations of Drosophila melanogaster. Genetics. 1980, 96: 165-186.PubMed CentralPubMedGoogle Scholar
- Uchida S, Uenoyama T, Oishi K: Studies on the sex-specific lethals of Drosophila melanogaster. III. A third chromosome male-specific lethal mutant. Jpn J Genet. 1981, 56: 523-527. 10.1266/jjg.56.523.View ArticleGoogle Scholar
- Lucchesi JC, Skripsky T, Tax FE: A new male-specific mutation in Drosophila melanogaster. Genetics. 1982, 100: s42-Google Scholar
- Hilfiker A, Hilfiker-Kleiner D, Pannuti A, Lucchesi JC: mof, a putative acetyl transferase gene related to the Tip60 and MOZ human genes and to the SAS genes of yeast, is required for dosage compensation in Drosophila. EMBO J. 1997, 2054-2060. 10.1093/emboj/16.8.2054.Google Scholar
- Meller VH, Wu KH, Roman G, Kuroda MI, Davis RL: roX1 RNA paints the X chromosome of male Drosophila and is regulated by the dosage compensation system. Cell. 1997, 88: 445-457. 10.1016/S0092-8674(00)81885-1.PubMedView ArticleGoogle Scholar
- Amrein H, Axel R: Genes expressed in neurons of adult male Drosophila. Cell. 1997, 88: 459-469. 10.1016/S0092-8674(00)81886-3.PubMedView ArticleGoogle Scholar
- Franke A, Baker BS: The rox1 and rox2 RNAs are essential components of the compensasome, which mediates dosage compensation in Drosophila. Mol Cell. 1999, 4: 117-122. 10.1016/S1097-2765(00)80193-8.PubMedView ArticleGoogle Scholar
- Turner BM, Birley AJ, Lavender J: Histone H4 isoforms acetylated at specific lysine residues define individual chromosomes and chromatin domains in Drosophila polytene nuclei. Cell. 1992, 69: 375-384. 10.1016/0092-8674(92)90417-B.PubMedView ArticleGoogle Scholar
- Bone JR, Lavender J, Richman R, Palmer MI, Turner BM, Kuroda MI: Acetylated histone H4 on the male X chromosome is associated with dosage compensation in Drosophila. Genes Dev. 1994, 8: 96-104. 10.1101/gad.8.1.96.PubMedView ArticleGoogle Scholar
- Kuroda MI, Kernan MJ, Kreber R, Ganetzky B, Baker BS: The maleless protein associates with the X chromosome to regulate dosage compensation in Drosophila. Cell. 1991, 66: 935-947. 10.1016/0092-8674(91)90439-6.PubMedView ArticleGoogle Scholar
- Palmer MJ, Mergner VA, Richman R, Manning JE, Kuroda MI, Lucchesi JC: The male-speeific lethal-one (msl- 1) gene of Drosophila melanogaster encodes a novel protein that associates with the X chromosome in males. Genetics. 1993, 134: 545-557.PubMed CentralPubMedGoogle Scholar
- Gorman M, Franke A, Baker BS: Molecular characterization of the male-specific lethal-3 gene and investigations of the regulation of dosage compensation in Drosophila. Development. 1995, 121: 463-475.PubMedGoogle Scholar
- Kelley RL, Solovyeva I, Lyman LM, Richman R, Solovyev V, Kuroda MI: Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell. 1995, 81: 867-877. 10.1016/0092-8674(95)90007-1.PubMedView ArticleGoogle Scholar
- Bashaw GJ, Baker BS: The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development. 1995, 121: 3245-3258.PubMedGoogle Scholar
- Zhou S, Yang Y, Scott MJ, Pannuti A, Fehr KC, Eisen A, Koonin EV, Fouts DL, Wrightsman R, Manning JE, Lucchesi JC: Male-specific lethal 2, a dosage compensation gene of Drosophila, undergoes sex-specific regulation and encodes a protein with a RING finger and a metallothionein-like cysteine cluster. EMBO J. 1995, 14: 2884-2895.PubMed CentralPubMedGoogle Scholar
- Gu W, Szauter P, Lucchesi JC: Targeting of MOF, a putative histone acetyl transferase, to the X chromosome of Drosophila melanogaster. Dev Genet. 1998, 22: 56-64. 10.1002/(SICI)1520-6408(1998)22:1<56::AID-DVG6>3.0.CO;2-6.PubMedView ArticleGoogle Scholar
- Alekseyenko AA, Larschan E, Lai WR, Park PJ, Kuroda MI: High-resolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome. Genes Dev. 2006, 20: 848-857. 10.1101/gad.1400206.PubMed CentralPubMedView ArticleGoogle Scholar
- Gilfillan GD, Straub T, de Wit E, Greil F, Lamm R, van Steensel B, Becker PB: Chromosome-wide gene-specific targeting of the Drosophila dosage compensation complex. Genes Dev. 2006, 20: 858-870. 10.1101/gad.1399406.PubMed CentralPubMedView ArticleGoogle Scholar
- Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, Bertone P, Akhtar A: Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell. 2008, 133: 813-828. 10.1016/j.cell.2008.04.036.PubMedView ArticleGoogle Scholar
- Dahlsveen IK, Gilfillan GD, Shelest VI, Lamm R, Becker PB: Targeting determinants of dosage compensation in Drosophila. PLoS Genet. 2006, 2: e5-10.1371/journal.pgen.0020005.PubMed CentralPubMedView ArticleGoogle Scholar
- Gilfillan GD, König C, Dahlsveen IK, Prakoura N, Straub T, Lamm R, Fauth T, Becker PB: Cumulative contributions of weak DNA determinants to targeting the Drosophila dosage compensation complex. Nucleic Acids Res. 2007, 35: 3561-3572. 10.1093/nar/gkm282.PubMed CentralPubMedView ArticleGoogle Scholar
- Kelley RL, Meller VH, Gordadze PR, Roman G, Davis RL, Kuroda MI: Epigenetic spreading of the Drosophila dosage compensation complex from roX RNA genes into flanking chromatin. Cell. 1999, 98: 513-522. 10.1016/S0092-8674(00)81979-0.PubMedView ArticleGoogle Scholar
- Demakova OV, Kotlikova IV, Gorgadze PR, Alekseyenko AA, Kuroda MI, Zhimulev IF: The MSL complex levels are critical for its correct targeting to the chromosomes of Drosophila melanogaster. Chromosoma. 2003, 112: 103-115. 10.1007/s00412-003-0249-1.PubMedView ArticleGoogle Scholar
- Fagegaltier D, Baker BS: X chromosome sites autonomously recruit the dosage compensation complex in Drosophila males. PLoS Biol. 2004, 2: e341-10.1371/journal.pbio.0020341.PubMed CentralPubMedView ArticleGoogle Scholar
- Straub T, Grimaud C, Gilfillan GD, Mitterweger A, Becker PB: The chromosomal high-affinity binding sites for the Drosophila dosage compensation complex. PLoS Genet. 2008, 4: e1000302-10.1371/journal.pgen.1000302.PubMed CentralPubMedView ArticleGoogle Scholar
- Alekseyenko AA, Peng S, Larschan E, Gorchakov AA, Lee OK, Kharchenko P, McGrath SD, Wang CI, Mardis ER, Park PJ, Kuroda MI: A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome. Cell. 2008, 134: 599-609. 10.1016/j.cell.2008.06.033.PubMed CentralPubMedView ArticleGoogle Scholar
- Sural TH, Peng S, Li B, Workman JL, Park PJ, Kuroda MI: The MSL3 chromodomain directs a key targeting step for dosage compensation of the Drosophila melanogaster X chromosome. Nat Struct Mol Biol. 2008, 15: 1318-1325. 10.1038/nsmb.1520.PubMed CentralPubMedView ArticleGoogle Scholar
- Sun X, Birchler JA: Studies on the short range spreading of the male specific lethal (MSL) complex on the X chromosome in Drosophila . Cytogenet Genome Res. 2009, 124: 158-169. 10.1159/000207524.PubMed CentralPubMedView ArticleGoogle Scholar
- Kind J, Akhtar A: Cotranscriptional recruitment of the dosage compensation complex to X-linked target genes. Genes Dev. 2007, 21: 2030-2040. 10.1101/gad.430807.PubMed CentralPubMedView ArticleGoogle Scholar
- Larschan E, Alekseyenko AA, Gortchakov AA, Peng S, Li B, Yang P, Workman JL, Park PJ, Kuroda MI: MSL complex is attracted to genes marked by H3K36 trimethylation using a sequence-independent mechanism. Mol Cell. 2007, 28: 121-133. 10.1016/j.molcel.2007.08.011.PubMedView ArticleGoogle Scholar
- Bell O, Conrad T, Kind J, Wirbelauer C, Akhtar A, Schübeler D: Transcription-coupled methylation of histone H3 at lysine 36 regulates dosage compensation by enhancing recruitment of the MSL complex in Drosophila melanogaster. Mol Cell Biol. 2008, 28: 3401-3409. 10.1128/MCB.00006-08.PubMed CentralPubMedView ArticleGoogle Scholar
- Gorchakov AA, Alekseyenko AA, Kharchenko P, Park PJ, Kuroda MI: Long-range spreading of dosage compensation in Drosophila captures transcribed autosomal genes inserted on X. Genes Dev. 2009, 23: 2266-2271. 10.1101/gad.1840409.PubMed CentralPubMedView ArticleGoogle Scholar
- Straub T, Becker PB: Dosage compensation: the beginning and end of generalization. Nat Rev Genet. 2007, 8: 47-57. 10.1038/nrg2013.PubMedView ArticleGoogle Scholar
- Straub T, Becker PB: DNA sequence and the organization of chromosomal domains. Curr Opin Genet Dev. 2008, 18: 175-180. 10.1016/j.gde.2008.01.001.PubMedView ArticleGoogle Scholar
- Gallach M, Arnau V, Marín I: Global patterns of sequence evolution in Drosophila. BMC Genomics. 2007, 8: 408-10.1186/1471-2164-8-408.PubMed CentralPubMedView ArticleGoogle Scholar
- Arnau V, Gallach M, Marín I: Fast comparison of DNA sequences by oligonucleotide profiling. BMC Res Notes. 2008, 1: 5-10.1186/1756-0500-1-5.PubMed CentralPubMedView ArticleGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 12: 1599-1610. 10.1101/gr.403602.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology; Menlo Park, California. 1996, Edited by AAAI Press, 28-36.Google Scholar
- Hughes JD, Estep PW, Tavazole S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000, 296: 1205-1214. 10.1006/jmbi.2000.3519.PubMedView ArticleGoogle Scholar
- Pavesi G, Mauri G, Pesole G: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics. 2001, 17 (Suppl 1): S207-S214.PubMedView ArticleGoogle Scholar
- Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouze P, Moreau Y: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics. 2001, 17: 1113-1122. 10.1093/bioinformatics/17.12.1113.PubMedView ArticleGoogle Scholar
- Frith MC, Saunders NF, Kobe B, Bailey TL: Discovering sequence motifs with arbitrary insertions and deletions. PLoS Comput Biol. 2008, 4: e1000071-10.1371/journal.pcbi.1000071.PubMed CentralPubMedView ArticleGoogle Scholar
- Hu J, Li B, Kihara D: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 2005, 33: 4899-4913. 10.1093/nar/gki791.PubMed CentralPubMedView ArticleGoogle Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z: Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005, 23: 137-144. 10.1038/nbt1053.PubMedView ArticleGoogle Scholar
- Das MK, Dai HK: A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007, 8 (Suppl 7): S21-10.1186/1471-2105-8-S7-S21.PubMed CentralPubMedView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView ArticleGoogle Scholar
- Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV: enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res. 2005, W389-392. 10.1093/nar/gki439. 33 Web Server
- Flybase: D melanogaster 4.3 genome release. [ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r4.3_20060303/fasta/]
- Gallach M: Análisis de patrones globales de evolución genómica. PhD Thesis. 2008, Universidad de ValenciaGoogle Scholar
- Marín I, Franke A, Bashaw GJ, Baker BS: The dosage compensation system of Drosophila is co-opted by newly evolved X chromosomes. Nature. 1996, 383: 160-163. 10.1038/383160a0.PubMedView ArticleGoogle Scholar
- Drosophila 12 Genomes Consortium: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 450: 203-218. 10.1038/nature06341.View ArticleGoogle Scholar
- Tamura K, Subramanian S, Kumar S: Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol Biol Evol. 2004, 21: 36-44. 10.1093/molbev/msg236.PubMedView ArticleGoogle Scholar
- Vicario S, Moriyama EN, Powell JR: Codon usage in twelve species of Drosophila. BMC Evol Biol. 2007, 7: 226-10.1186/1471-2148-7-226.PubMed CentralPubMedView ArticleGoogle Scholar
- Singh ND, Davis JC, Petrov DA: X-linked genes evolve higher codon bias in Drosophila and Caenorhabditis. Genetics. 2005, 171: 145-155. 10.1534/genetics.105.043497.PubMed CentralPubMedView ArticleGoogle Scholar
- Kageyama Y, Mengus G, Gilfillan G, Kennedy HG, Stuckenholz C, Kelley RL, Becker PB, Kuroda MI: Association and spreading of the Drosophila dosage compensation complex from a discrete roX1 chromatin entry site. EMBO J. 2001, 20: 2236-2245. 10.1093/emboj/20.9.2236.PubMed CentralPubMedView ArticleGoogle Scholar
- Park Y, Mengus G, Bai X, Kageyama Y, Meller VH, Becker PB, Kuroda MI: Sequence-specific targeting of Drosophila roX genes by the MSL dosage compensation complex. Mol Cell. 2003, 11: 977-986. 10.1016/S1097-2765(03)00147-3.PubMedView ArticleGoogle Scholar
- Oh H, Bone JR, Kuroda MI: Multiple classes of MSL binding sites target dosage compensation to the X chromosome of Drosophila. Curr Biol. 2004, 14: 481-487. 10.1016/j.cub.2004.03.004.PubMedView ArticleGoogle Scholar
- Park Y, Kelley RL, Oh H, Kuroda MI, Meller VH: Extent of chromatin spreading determined by roX RNA recruitment of MSL proteins. Science. 2002, 298: 1620-1623. 10.1126/science.1076686.PubMedView ArticleGoogle Scholar
- Lowenhaupt K, Rich A, Pardue ML: Nonrandom distribution of long mono- and dinucleotide repeats in Drosophila chromosomes: correlations with dosage compensation, heterochromatin, and recombination. Mol Cell Biol. 1989, 9: 1173-1182.PubMed CentralPubMedView ArticleGoogle Scholar
- Baker BS, Gorman M, Marín I: Dosage compensation in Drosophila. Annu Rev Genet. 491-521.
- Marín I, Siegal ML, Baker BS: The evolution of dosage-compensation mechanisms. Bioessays. 2000, 22: 1106-1114. 10.1002/1521-1878(200012)22:12<1106::AID-BIES8>3.0.CO;2-W.PubMedView ArticleGoogle Scholar
- Bachtrog D: Positive selection at the binding sites of the male-specific lethal complex involved in dosage compensation in Drosophila. Genetics. 2008, 180: 1123-1129. 10.1534/genetics.107.084244.PubMed CentralPubMedView ArticleGoogle Scholar
- Levine MT, Holloway AK, Arshad U, Begun DJ: Pervasive and largely lineage-specific adaptive protein evolution in the dosage compensation complex of Drosophila melanogaster. Genetics. 2007, 177: 1959-1962. 10.1534/genetics.107.079459.PubMed CentralPubMedView ArticleGoogle Scholar
- Rodriguez MA, Vermaak D, Bayes JJ, Malik HS: Species-specific positive selection of the male-specific lethal complex that participates in dosage compensation in Drosophila. Proc Natl Acad Sci USA. 2007, 104: 15412-15417. 10.1073/pnas.0707445104.PubMed CentralPubMedView ArticleGoogle Scholar
- McDonel P, Jans J, Peterson BK, Meyer BJ: Clustered DNA motifs mark X chromosomes for repression by a dosage compensation complex. Nature. 2006, 444: 614-618. 10.1038/nature05338.PubMed CentralPubMedView ArticleGoogle Scholar
- Ercan S, Giresi PG, Whittle CM, Zhang X, Green RD, Lieb JD: X chromosome repression by localization of the C. elegans dosage compensation machinery to sites of transcription initiation. Nat Genet. 2007, 39: 403-408. 10.1038/ng1983.PubMed CentralPubMedView ArticleGoogle Scholar
- Blauwkamp TA, Csankovszki G: Two classes of DCC binding elements facilitate binding and spreading of the DCC along C. elegans X-chromosomes. Mol Cell Biol. 2009, 29: 2023-2031. 10.1128/MCB.01448-08.PubMed CentralPubMedView ArticleGoogle Scholar
- Carrel L, Park C, Tyekucheva S, Dunn J, Chiaromente F, Makova KD: Genomic enviroment predicts expression patterns on the human inactive X chromosome. PloS Genet. 2006, 2: e151-10.1371/journal.pgen.0020151.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Z, Willard HF, Mukherjee S, Furey T: Evidence of influence of genomic DNA sequence on human X chromosome inactivation. Plos Comput Biol. 2006, 2: e113-10.1371/journal.pcbi.0020113.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.