Drosophila melanogaster males are heterogametic (XY), while females are homogametic (XX). The Y chromosome has gradually lost genes and degenerated, resulting in an increasingly aneuploid condition in males and the evolution of systems that compensate for between-sex differences in doses of genes located on X chromosomes [1–4]. The dosage-compensation system equalizes X-linked gene expression between males and females, thus maintaining an appropriate balance between the expression of genes on X chromosome(s) and the autosomes [5, 6].
The amount of transcripts from the single X chromosome of male Drosophila individuals is boosted about two-fold relative to levels of each of the two in females, thereby roughly equalizing their overall X chromosome gene expression . This dosage compensation is critical, and loss of required proteins leads to male-specific lethality [8, 9]. These proteins include MSL-1 (male-specific lethal 1), MSL-2, MSL-3, MOF (males absent on the first) and MLE (maleless), which form an X chromosome-specific MSL complex, or dosage compensation complex (DCC), with two functionally redundant long non-coding RNAs: RNA on the X1 and X2; roX1 and roX2, respectively [10–14]. The selective activation of X chromosomal genes is at least partly due to the hyperacetylation of histone H4 lysine 16 (H4K16) by the histone acetyltransferase (HAT) - MOF, an integral subunit of the MSL complex [15, 16].
The binding pattern of MSL proteins on the X chromosome has been identified in diverse cell lines, embryos and third instar larvae using various genome-wide techniques such as chromatin immunoprecipitation coupled with microarray technology (ChIP-on-chip) or deep sequencing (ChIP-seq) [17–22]. Transcript levels of genes in RNAi-mediated depletion backgrounds and msl gene mutants have also been examined in diverse cell lines, embryos, and larvae using hybridization of transcript populations to gene expression microarrays or Real-time PCR [20, 23–25]. These studies have revealed that: the MSL complex preferentially binds to gene coding regions, particularly the 3' end of genes; the binding pattern does not dramatically change during different stages of development; and loss of MSL-complex functionality only reduces expression of X-linked genes to about 80% of wild type levels. In addition, results of a recent analysis indicate that the MSL complex mediates dosage compensation of X chromosomal genes by enhancing transcriptional elongation, in accordance with the observed 3' bias .
Two main models have been proposed to explain the distribution of MSL complexes along the X chromosome. One suggests that the complex initially targets a relatively small number of X chromosome-specific primary recruitment or chromosomal "entry" sites (CES) then "spreads" along the chromosome from these sites in cis [27, 28]. The other postulates that large numbers of specific sites of varying affinities are present, based on data gathered from X chromosomal translocation studies [29, 30].
In situ hybridization analyses of polytene chromosomes have shown that the Drosophila X chromosome is enriched in (dC - dA)n/(dG - dT)n sequences , and that in every Drosophila species examined to date dosage-compensated chromosomes have higher than average CA/TG, CT/AG and C/G frequencies . Subsequent, computational whole-genome sequence analysis showed that throughout the Drosophila genus X chromosomes can be distinguished from other chromosomes by their A, T, C/An and G/Tn repeat sequences [33, 34]. Recent MSL protein-binding region analyses have also detected X chromosomal enrichment of low complexity sequence elements, such as GA- and CA-based dinucleotide repeats and runs of adenines [19, 22, 29, 35]. In addition, GA-rich or TC-rich motifs have been identified in high affinity binding sites (HAS) for MSL proteins on the X chromosome using genome-wide techniques [18, 22]. A repetitive sequence motif [G(CG)N]4 was also recently discovered in low affinity sites targeted by MSL proteins . However, although the enrichment of simple sequence elements has been detected on the X chromosome it is still unclear if primary DNA sequences are involved in the targeting of the MSL complex to and within individual genes.
Here we present an extensive analysis of X chromosome sequence variation, and its potential involvement in dosage compensation, in which we used multivariate modeling and previously published data to explore relationships between MSL complex distributions, transcription patterns and five gene features -- promoters, 5' UTRs, coding sequences (CDS), introns, 3' UTRs -- and intergenic sequences (hereafter also classed as gene features, for convenience). Our results show that: the X chromosome has a distinct sequence composition within all six types of features examined; some of this variation correlates with genes targeted by the MSL-complex; the insulator protein BEAF-32 binds preferentially upstream of MSL-bound genes; BEAF-32 and MOF co-localizes in promoters; and bound genes have a distinct sequence composition that shows a 3' bias within coding sequence.