Immunity related genes in dipterans share common enrichment of AT-rich motifs in their 5' regulatory regions that are potentially involved in nucleosome formation
BMC Genomics volume 9, Article number: 326 (2008)
Understanding the transcriptional regulation mechanisms in response to environmental challenges is of fundamental importance in biology. Transcription factors associated to response elements and the chromatin structure had proven to play important roles in gene expression regulation. We have analyzed promoter regions of dipteran genes induced in response to immune challenge, in search for particular sequence patterns involved in their transcriptional regulation.
5' upstream regions of D. melanogaster and A. gambiae immunity-induced genes and their corresponding orthologous genes in 11 non-melanogaster drosophilid species and Ae. aegypti share enrichment in AT-rich short motifs. AT-rich motifs are associated with nucleosome formation as predicted by two different algorithms. In A. gambiae and D. melanogaster, many immunity genes 5' upstream sequences also showed NFκB response elements, located within 500 bp from the transcription start site. In A. gambiae, the frequency of ATAA motif near the NFκB response elements was increased, suggesting a functional link between nucleosome formation/remodelling and NFκB regulation of transcription.
AT-rich motif enrichment in 5' upstream sequences in A. gambiae, Ae. aegypti and the Drosophila genus immunity genes suggests a particular pattern of nucleosome formation/chromatin organization. The co-occurrence of such motifs with the NFκB response elements suggests that these sequence signatures may be functionally involved in transcriptional activation during dipteran immune response. AT-rich motif enrichment in regulatory regions in this group of co-regulated genes could represent an evolutionary constrained signature in dipterans and perhaps other distantly species.
Organismal complexity is dependent on the network that regulates gene expression, rather than the number of genes in its genome [1–3]. Thus, one of the biggest challenges in postgenomic research is understand the regulatory mechanisms controlling location, timing and intensity of gene expression.
Organisms are permanently sensing changes in their environment. Environmental agents activate cellular signaling pathways that lead to a rapid expression of specific genes to respond to changes. These pathways transmit their signal to specific transcription factors (TFs) which gain access to response elements (REs) located in promoter and enhancer regions of the corresponding gene  resulting in transcriptional activation. In eukaryotes these protein-DNA interactions occur in the context of a chromatin template within the cell nucleus. The fundamental unit of chromatin is the nucleosome, composed by a segment of 146 base pairs of double stranded DNA wrapped around a core of histone proteins . Initially, nucleosomes were regarded as structures required for the packing of long DNA molecules into the cellular nucleus , but it is now clear that chromatin structure plays a central role in the regulation of gene expression [6–9]. At least three mechanisms have been proposed for the active role of chromatin in transcriptional regulation. First, by preventing TF binding to its cognate RE as revealed by the pioneering studies in the expression of PHO5 gene in response to phosphate starvation . Secondly, wrapping DNA in nucleosomes may promote transcription by allowing closely adjacent RE access to their cognate TF [11, 12]. Third, nucleosomes may approximate distant regulatory elements, as it occurs in the alcohol-dehydrogenase (Adh) promoter region of Drosophila .
Nucleosomes are located in preferred positions with respect to DNA sequence [14–21]. It has been shown that on a statistical level, groups of experimentally obtained nucleosomal sequences display periodicity in the occurrence of dinucleotides such as GG, TA, TG, and TT [14, 15, 20] or trinucleotides such as VWG ([G/C/A] [A/T]G) . This periodicity tends to occur approximately every 10 bp, coinciding with one turn of the DNA chain and confers better bending properties required for wrapping DNA around the histone core. However, this periodicity is difficult to identify on individual nucleosomal sequences due to a low signal/noise ratio. The non-random distribution of nucleosomes suggests that some DNA sequences are more likely to form stable nucleosomes, and therefore nucleosome forming sequences could be predicted using computational methods based on the sequence features identified so far [20, 22].
Immune responses are inducible phenomena resulting from a close relationship between the environment, pathogen signal detection systems and the gene expression machinery . Upon pathogen recognition, several transduction pathways are activated leading to the activation of TFs that induce gene expression . In Drosophila melanogaster, the Toll and Imd pathways converge in the activation of the NFκB/Rel-related TFs, Dif and Relish, respectively, which bind to NFκB REs located in the 5' upstream regions of antimicrobial peptide genes, thus promoting their transcription .
Understanding the transcriptional regulation mechanisms during insect immune response is of fundamental interest in biology, but also could provide the rational basis for developing strategies to control vector borne diseases. In this work, we describe that immunity genes induced upon immune challenge in D. melanogaster and Anopheles gambiae, the main African malaria vector, share an enrichment of AT-rich motifs in their 5' regulatory regions. Enrichment of AT-rich motifs was also observed in 10 additional non-melanogaster Drosophila species and Aedes aegypti immunity orthologs. These motifs are different to REs in terms of statistical frequency and length. Their occurrence correlates with predicted nucleosomal positions [18, 20, 22], suggesting that AT-rich motifs may be involved in chromatinization and transcriptional regulation of immunity related transcriptional gene modules in these insects.
Regulatory regions of immunity-related co-expressed genes of Anopheles gambiae and Drosophila melanogaster induced upon immune challenge are enriched in AT-rich specific DNA motifs
We used public available and author provided microarray databases [26, 27], coupled to bioinformatics analysis tools for regulatory sequences, to identify sequence patterns potentially involved in transcriptional regulation operating during immune response in D. melanogaster and A. gambiae .
Microarray data describing the temporal transcriptional profile for 13,196 D. melanogaster genes , and 2,883 A. gambiae genes  in response to various immune challenges were used to select genes with the following expression profiles: immunity induced, repressed and non-modified (Figure 1). The Drosophila microarray data were obtained using OregonR adult males challenged with Escherichia coli and Micrococcus luteus . The Anopheles microarray data were obtained using the A. gambiae cell line 4a-3B challenged with several bacteria species or microbial products . Gene groups were selected according to 1) gene ontology and Interpro assignments as well as induction after an immune challenge, 2) genes that were not modified upon immune challenges, and 3) down-regulated genes upon immune challenges. Two additional groups were added as controls: 4) randomly selected genes and 5) computer-randomly generated sequences (artificial). Table 1 presents the number of analyzed genes in each group and Tables 2, 3, 4, 5, 6, 7 list the Ensembl or AnoEST ID, gene description, and chromosomal location for the genes used in the analysis. The expression profiles of the three gene groups (immunity induced, repressed and non-modified) of both species are shown in Figure 1.
To investigate whether the 5' regulatory regions of immunity-related genes shared common DNA motifs, 2500 bp 5' upstream sequences (5'-US) were recovered using Biomart, of Ensembl  and analyzed for statistically overrepresented motifs of 2 to 8 nucleotides in length, using Oligo-Analysis, which is based on binomial distribution . The background oligonucleotide frequencies were estimated calculating the relative frequencies of all possible oligonucleotides (ranging from 2 to 8 bp) within the 5'-US of 2500 bp of length of 13,166 A. gambiae or 13,172 D. melanogaster genes. Oligonucleotide occurrences were counted for each group of 5'-US and their statistical significance was estimated on the basis of the background frequencies. The significance index (sigocc) reflects the degree of overrepresentation of each motif on a logarithmic scale .
Analysis of 5'-US of A. gambiae immunity genes showed that the main motifs statistically overrepresented were those of 2, 3 and 4 nucleotides in length (Table 8). Similar to A. gambiae, D. melanogaster 5'-US of immunity genes showed an enrichment of 2, 3 and 4 letter motifs (Table 8 and Tables 9, 10, 11, 12, 13, 14). Motifs of these length in 5'-US of immunity genes showed higher sigocc than those obtained in 5'-US of down-regulated and non-modified genes in both insects. For A. gambiae immunity genes, the highest score for 4 pb motifs was of 11.26 (ATAA), versus 2.59 (AAAA) and 2.46 (CGAC) in down-regulated and non-modified genes, respectively (Table 9). Motifs of 3 bp and 2 bp in length also present the highest sigocc in immunity induced A. gambiae genes: 6.11 for the AAA motif in immunity genes, versus 0.77 for the same motif in down-regulated genes, without 3 bp motif in non-modified genes (Table 11); 20.99 for AA motif in immunity genes, versus 0.72 and 1.73 for CG in down-regulated and non-modified genes, respectively (Table 13). Similar results for 2, 3 and 4 bp motifs were obtained in D. melanogaster 5'-US (Tables 10, 12 and 14). These observations underline the high overrepresentation of certain motifs in 5'-US of these insect immunity genes. Interestingly, the motifs sequences with the highest scores in 5'-US of immunity genes were the same in both organisms: TA, AA, AT, AAA and ATAA (Table 8).
The sigocc for the ATAA motif in 2500 pb 5'-US of A. gambiae immunity genes was 11.26, indicating that this four base-pairs motif is expected to occur in one of 1011.26 groups with similar numbers of sequences of the same length of random sequences. In comparison, the ATAA motif was not present in non-modified and down-regulated genes of A. gambiae (Table 15 and Table 9). The ATAA motif in 5'-US D. melanogaster immunity genes had a sigocc of 14.21, versus 2.2 in down-regulated genes, and was absent in non-modified genes (Table 15 and Table 10). Similar results were obtained for the conserved motifs AAA, AA, AT and TA. These results indicate that statistical overrepresentation of these motifs is specific of immunity genes 5'-US (Table 15).
In agreement to the Oligo-Analysis, bootstrapping analysis using POBO  confirmed that the average occurrence of the conserved motifs TA, AA, AT, AAA and ATAA, was significantly higher (p < 0.0001) in immunity promoters compared to the whole genome, the non-modified genes, down-regulated genes and the random sequence sets for both insects (Figure 2).
As a consequence of AT-rich motifs over-representation, a slight increase in AT% content was observed in 5'-US immunity genes, the average AT% for D. melanogaster immunity genes was of 60.4%, versus 57.6% and 57.2% for non-modified and down-regulated genes, respectively, these differences were not significant (p > 0.001), indicating that AT-rich motif over-representation was not due to a significant increase in AT % that could lead to a random AT-rich motifs enrichment. Similar results were obtained for A. gambiae (Figure 3).
5' upstream regions of immunity-related orthologous genes in the genus Drosophila and in Aedes aegypti are also enriched with AT-rich motifs
In order to determine if other related dipteran species share the same motifs identified in A. gambiae and D. melanogaster, 5'-US of orthologous immunity genes of other Drosophila species and another Culicidae family member, Aedes aegypti were analyzed, regardless their transcriptional profile. Orthologous genes from those initially selected from D. melanogaster expression profiles were selected from the recently sequenced eleven Drosophila species [32, 33]. Tables 16 and 17 show the list of orthologous genes present in each Drosophila species and Ae. aegypti. 5'-US from these genes were screened with Oligo-Analysis. Figure 4 shows the results obtained for the 4bp motifs TATA, AAAA, ATAA, AAAT and TTAA, which were statistically over-represented in the majority of 5'-US of immunity orthologous genes of 12 Drosophila species, A. gambiae and Ae. aegypti. We observed some phylogenetic correlations for some motifs. The most prevalent motif and with the highest sigocc scores was the TATA motif, which was within the best ranked for most Drosophila species (9/12), but absent in Anopheles and Aedes. The AAAA motif was also highly ranked among some Drosophila species but not in mosquitoes. The ATAA motif was highly ranked in D. melanogaster, D. simulans and A. gambiae, but absent in other Drosophilas and Ae. aegypti. Finally, motifs such as TTAA and AAAT were highly ranked in mosquitoes only. Intriguingly, D. persimilis did not show any enrichment of AT-rich motifs at all, regardless its close genetic distance to species displaying clear AT-rich motif enrichment. It should be noted that Sigocc values for the non-melanogaster sequences and Ae. aegypti were considerably lower than those observed in A. gambiae and D. melanogaster (Table 18. See additional file 1).
AT-rich tetrads are associated with high nucleosomal potential
Once observed that AT-rich motif enrichment was a general feature of 5'-US of immunity genes in several dipteran species, we evaluated the association of some of these motifs with predicted nucleosomal sites. Experimentally stable nucleosomes in mouse have AT-rich motifs, including the AA , TA [14, 20], TATA and ATAA motifs . The ATAA motif include the three 2 bp motifs statistically over-represented in immunity genes (AA, TA and AT) of A. gambiae and D. melanogaster. Taking into account the highly conserved nucleosomal structure and given that the ATAA motif was enriched in both D. melanogaster and A. gambiae, as well as in other Drosophila species, we hypothesized that the ATAA motif could also participate in nucleosome formation in dipteran immune response genes. Some algorithms have been developed to predict the chromatin structure from sequence [20, 22, 34]. The RECON algorithm uses experimentally determined nucleosomal sequences coupled to Monte Carlo methods and discriminant analysis of dinucleotide frequencies . It searches for a partition of non-overlapping regions in the nucleosomal sequences that provides the maximal value of the Mahalanobis distance that discriminates between nucleosomal and non-nucleosomal sequences. In this way, RECON determines the probability that a sequence forms nucleosomes and assign a nucleosomal potential value to each nucleotide according to the context of the sequence in which the nucleotide is immersed. Positive values of nucleosomal potential correspond to reliable predictions of nucleosome formation sites with a confidence level of p < 0.05 (α = 0.05), nucleosomal potential of +1 corresponds to the best predictions.
Using RECON, ATAA motifs were preferentially associated to positive values of nucleosomal potential in all the biological groups, both in A. gambiae and D. melanogaster (Figure 5), supporting a possible role for this motif in nucleosome formation. From Figure 5 is evident that ATAA is associated with positive nucleosomal potential values independently of the group of biological 5'-US analyzed. As expected, immunity, down-regulated, non-modified and random selected 5'-US of A. gambiae and D. melanogaster, all have ATAA motifs, however, 5'-US of immunity genes have a significant increased number of ATAA motifs. The frequency of ATAA associated with positive nucleosomal potential in immunity 5'-US is higher than non-modified (p = 0.001 both in A. gambiae and D. melanogaster), down-regulated (p = 0.006 in A. gambiae and p = 0.012 in D. melanogaster), random (p = 0.005 in A. gambiae and p < 0.001 in D. melanogaster) and non-biological (artificial) sequences (p < 0.001 in A. gambiae and D. melanogaster). Analyzing ATAA distribution per group of genes, more than 70% and 80% of all ATAA motifs in A. gambiae and D. melanogaster, respectively, were located within regions of positive nucleosomal potential values (p < 0.001), indicating a possible role of this motif in nucleosome formation.
Additional information derived from Figure 5 is that the combination of RECON with Oligo-Analysis results allows detection of a property inherent to biological sequences. The ATAA distribution with respect to nucleosomal potential values was utterly different between biological and non-biological (artificial) sequences for both insects. On the one hand, biological sequences had more ATAA than non-biological sequences (1065 ATAA motifs in 30 random biological sequences versus 613 in 30 random non-biological sequences). The majority of the biological ATAA motifs were associated to positive nucleosomal potential (872/1065 or 81.9% motifs in 30 random biological sequences versus 208/613 or 33.9% in 30 random non-biological sequences). Additionally, the non-biological sequences presented an inverse distribution of ATAA motifs, with more ATAA motifs associated to negative values of nucleosomal potential (289/613 or 47.2% ATAA negatives versus 208/613 or 33.9% ATAA positives in 30 random non-biological sequences).
We also evaluated the association of ATAA and TATA motifs in D. ananassae, representing the non-melanogaster subgroup; D. pseudoobscura of the obscura group and D. grimshawi, the most phylogenetically distant species belonging to the Hawaiian Drosophila, as well as the TTAA and AAAT motif in Ae. aegypti (non-drosophilid dipteran), with nucleosomal potential calculated by the RECON algorithm. As shown in Figure 6, both motifs in the drosophilid species analyzed are clearly associated to positive nucleosomal potential values (p < 0.001; Figure 6A–B). Similarly, both motifs analyzed in Ae. aegypti were associated to high nucleosomal potential values and the difference between AT-rich motifs with positive nucleosomal potential and AT-rich motifs with other values was statistically significant (p < 0.001) (Figure 6C).
ATAA motifs correlate with high Nucleosomal Occupancy p values (pNO)
Segal and col.  recently reported an algorithm to predict nucleosome positions that takes into account sequence composition and thermodynamic properties. Using a collection of nucleosome bound DNA sequences from yeast, chicken or human, they constructed probabilistic models that represent the DNA sequence preferences for nucleosome formation and assign a p value to each nucleotide of the analyzed sequence; this value indicates the probability that the position is occupied by a nucleosome (p of Nucleosomal Occupancy, pNO).
Applying the three models to all the groups of sequences analyzed from A. gambiae and D. melanogaster, we found that the ATAA motifs were associated with high pNO values when the sequences were analyzed using the yeast model (Figure 7). Distribution of ATAA was very similar to that obtained using RECON, showing a coincidence between two independent methods to predict the association of ATAA with nucleosomal positions. All the biological groups of analyzed sequences presented ATAA motifs associated largely with pNO > 0.8, however, immunity genes had a significant increased number of ATAA associated with pNO > 0.8 values in relation to other biological sequences (p < 0.05, except between immunity and down-regulated genes of D. melanogaster, where p = 0.155).
More than 85% of all ATAA motifs found in 5'-US of A. gambiae and D. melanogaster were associated with pNO > 0.8 (p < 0.001). For A. gambiae immunity genes, 88% (476/541) of ATAA motifs were associated with pNO > 0.8, a similar distribution was obtained for the other A. gambiae gene groups (Figure 7A). The difference between ATAA associated with pNO > 0.8 and ATAA associated with pNO < 0.5 or undefined values was statistically significant (p < 0.001), showing a clear correlation between ATAA and high values of pNO. In a similar way, 85.9% (1340/1560) of ATAA motifs in 5'-US D. melanogaster immunity genes had pNO > 0.8, with a significant difference with regard to ATAA with pNO < 0.5 or undefined values (p < 0.001), the distribution of ATAA in the other D. melanogaster groups of genes also was statistically significant (p < 0.001) (Figure 7B).
The combination of Oligo-Analysis and pNO results also revealed a difference between biological and non-biological sequences. Thirty randomly selected biological sequences of D. melanogaster had 90% (959/1065) of ATAA motifs associated with pNO > 0.8 versus 55.1% (338/613) of ATAA motifs in 30 non-biological sequences, showing again a non-random distribution of biological ATAA motif and tagging it as part of a potential nucleosomal code (Figure 7B).
Surprisingly no ATAA motifs were found with pNO values between 0.8 and 0.5 (values that define the "medium p value" range, see methods) in any of the other gene groups analyzed in A. gambiae and D. melanogaster (Figure 7).
We evaluated the association to probability of nucleosomal occupancy (pNO)  of ATAA and TATA motifs in D. ananassae, D. pseudoobscura and D. grimshawi; and the TTAA and AAAT motif in Ae. aegypti. As shown in Figure 8, both motifs in the drosophilid species analyzed are clearly associated to pNO > 0.8 values (p < 0.001, Figures 8A–B). Similarly, both motifs analyzed in Ae. aegypti were associated to pNO > 0.8 values (p < 0.001, Figure 8C).
Taken together, we found a consistent tendency, demonstrated by two independent methods, showing that the AT-rich motif enrichment within a specific sequence context might favour nucleosome formation in immune genes of a wide variety of dipteran species.
A. gambiae and D. melanogaster 5'-US of immunity genes have NFκB response elements located in the first 500 pb of their 5'-US
NFκB transcription factors in both insects and vertebrates are involved in immune gene expression regulation [23, 25]. Using MEME , NFκB REs were identified in the 5'-US of immunity genes, but not in 5'-US of down-regulated, non-modified and random genes. Moreover, the NFκB REs were enriched within the first 500 bp of the 5'-US of immunity genes, both in A. gambiae and D. melanogaster (Figure 9). In order to investigate if there is a functional and physical association between ATAA motifs and NFκB motifs, the distribution of ATAA motifs was analysed with respect to the transcription initiation site (TIS). Although ATAA motifs were distributed along the whole sequence in all gene groups, the highest frequencies were found to be located within the -251 to -500 interval in immunity genes in both insects (Figure 10). We further quantified ATAA frequency within ± 200 bp from the NFκB site of immunity, NM and DR genes in both insects. In the case of A. gambiae, the frequency of ATAA motifs around NFκB REs is significantly higher in immunity genes compared to both non-modified (p < 0.01) and down-regulated genes (p = 0.048). However, in the case of D. melanogaster we did not find significant differences in the ATAA frequency in relation to NFκB RE among the gene groups (p > 0.1), although the tendency was equal to that of A. gambiae (Figure 11). The association between NFκB REs to AT-rich motifs and possibly nucleosomes of immunity genes of A. gambiae and D. melanogaster may function as a specific link between the chromatin structure and the remodelling machinery needed for the expression of immune response genes.
In this work, we have documented that AT-rich motifs are over-represented in 5' upstream regions of immunity genes of mosquitoes and drosophilids. We documented also that the position of the AT-rich motifs is associated to nucleosomes as predicted by two different algorithms for nucleosome positioning, pointing out to a possible role of this motif in the transcriptional regulation of these functionally related genes through modification of chromatin structure involving nucleosome positioning.
Previous reports have found that sequences that form extremely stable nucleosomes are enriched with AT motifs referred as TATA boxes, which in many cases included the ATAA motif . When we correlated the positions of this motif with the output of two different algorithms that predict nucleosome positions [20, 22], we found that this motif correlates almost exclusively with positions with a high probability to form nucleosomes, suggesting that the ATAA motif enrichment is a DNA sequence pattern associated to nucleosome formation in these functionally related immunity genes. The conservation of enrichment of AT-rich motifs in 5'-US of immunity-related genes of Drosophilidae and Culicidae, which diverged 250 million years ago , suggest that this common feature may be the result of evolutionary constrained epigenetic mechanism of transcriptional regulation in immune-responsive genes in dipterans. More studies are required to define if this could be part of a more general mechanism of regulation in metazoans. The case of D. persimilis represents a caveat for our attempt generalize the implications of our findings, however, we cannot exclude that the current status of the annotation of such genomes may affect the results.
There are two conflicting views about nucleosome formation: one establishes that nucleosomes can potentially be formed anywhere in the genome regardless the sequence and therefore, it is not possible to predict sites for nucleosome formation . The other proposes that nucleosomes are associated to certain DNA sequences or sequence patterns that have an effect on the bending properties of DNA during nucleosome formation [14, 18]. This point of view has been gaining support in recent years due to the documentation of a great variability in the bending potential of DNA sequences [14, 15, 37] and therefore their capacity to form nucleosomes [18, 38, 39].
Two of the three two-letter motifs statistically over-represented in immunity promoters of A. gambiae and D. melanogaster, TA and AA, have been previously associated to nucleosome formation in human, yeast, chicken and mouse [14, 15, 20, 21]. Additionally, the ATAA motif, which is associated to nucleosome positions, and includes the three motifs containing two letters with the highest scores in both organisms (AA, TA and AT), have been found in sequences that form stable nucleosomes . Thus, AT-rich motifs in 5'-US regions of mosquitoes and drosophilid immunity genes could participate in the transcriptional regulation of genes induced by immune challenges in a different way to the typical response elements. In contrast to response elements, which can be functional single or in pairs in a promoter region, the AT-rich motifs are statistically enriched, with several copies distributed in a diffuse pattern through the promoter regions, suggesting its involvement in nucleosome formation. This diffuse sequence pattern of AT-rich motifs, different to discreet patterns displayed by response elements, represents a new insight on the role of DNA sequence context in transcriptional regulation.
Several reports have documented that genes with similar functions share similar nucleosomal occupancy patterns. Levitsky and col. , using the RECON algorithm to analyze distinct functional types of human promoters, found that tissue-specific gene promoters present higher nucleosomal potential than genes commonly expressed in many tissues (housekeeping genes). Segal and col. , using the Nucleosome Position Prediction algorithm to analyze different kinds of genomic sequences and gene sets biologically related, found that nucleosome occupancy varies depending of the analyzed genomic location type, and that groups of genes functionally related can be classified on the basis of their profiles of nucleosome occupancy in the open reading frames and intergenic regions. Recently, Lee and col. , using Hidden Markov Models to analyze experimentally obtained nucleosomes, also found a correlation between function and nucleosome occupancy. Each of these reports used a different method to analyze data sequences, and all found that nucleosomal sequences follow a distinctive pattern associated to the functionality of the genes.
In relation to RECON  and Nucleosome Positioning Prediction , it is important to note that none of these programs search for a priori defined motifs, the input for both programs are biological nucleosomal sequences from which information is extracted.
It has been shown that gene expression co-regulation is highly conserved in eukaryotes, for example, Saccharomyces cerevisiae and Caenorhabditis elegans, which diverged 1500 million years ago, still share a group of co-regulated genes , so it is plausible that Drosophilidae and Culicidae which diverged only 250 million years ago also share groups of functionally related co-regulated genes. The enrichment of AT-rich motifs in groups of co-regulated genes involved in immune response could provide the basis for developing new tools for the identification of different functional gene modules based on the compositional context of non-coding regulatory DNA. However, the high sigocc observed in manually curated datasets compared to the low sigocc observed in automatically annotated datasets highlights the importance of accurate TIS for regulatory region analysis.
Insect immunity relies on innate defense mechanisms to combat pathogens. In D. melanogaster, the Imd and Toll pathways lead to the activation of Rel/NFκB transcription factors that control a substantial proportion of the transcriptionally modified genes in response to pathogen infection . Many components of these pathways are conserved in A. gambiae and Ae. aegypti  and are also remarkably conserved in innate immunity signaling pathways in mammals (TLR and TNF-R signaling pathways, respectively) [33, 44–47]. The set of induced genes in both insects described here belong to the same functional group and many of them have NFκB response elements within 500 pb upstream from the predicted transcription start site, the same location where functionally important NFκB REs have been found in these and other insects [48–52]. Our findings indicate that besides being regulated by NFκB, the enrichment with the ATAA motif constitutes a particular pattern of chromatin structure involved in transcriptional regulation of these genes.
Interestingly, NFκB transcription factors bind to their response elements even if they are packaged in a nucleosome . Once bound to their response elements, NFκB transcription factors can recruit chromatin remodeling complexes to expose other response elements and allow the formation of the initiation complex . In the vertebrate immune system, chromatin structure is critical to establish Th1-Th2 differentiation through the action of specific transcription factor as GATA-3 and T-bet , and several cytokines posses nucleosomes located in their promoters which need to be removed to allow gene expression [56–59]. Thus, epigenetic phenomena such as histone modification (altered nucleosome conformation)  or remodeling of chromatin (change of nucleosome position)  are commonly a required step to achieve gene expression in response to external stimuli.
Based on the obtained results and previously reported information, we propose a model in which a subgroup of insect immunity genes remains silent in absence of an immune challenge due to nucleosome formation in their 5'-US regions. The presence of these nucleosomes occludes the access of transcription factors to REs involved in gene expression. After an immune challenge, the Toll and/or Imd pathways are activated which in turn lead to activation of Rel/NFκB transcription factors, which are translocated to the nucleus and bind to their NFκB REs and recruit chromatin modifying/remodeling factors that release DNA from nucleosomes allowing its interaction with the transcriptional machinery.
Functionally related genes could harbor in their regulatory region a regulatory code represented by the combination of REs plus, in some cases, particular short motifs associated to chromatin structure. This regulatory code functions like a lock, genes that need to be co-expressed will share the same lock, represented by REs organized in a similar way, or by specific REs associated to motifs that confer a distinctive chromatin structure. Cells are continuously sensing its environment and responding to adapt. The regulatory state of the cell, defined by the presence and state of activity of transcription factors , also changes continuously; this regulatory state represents the "key" needed to open the proposed lock. The active transcription factors present in a given time in the cell, determines the form of the "key" for the lock, and therefore, the class of promoters that will be open or closed. In the case of the immune genes studied here, we have identified evidence that is compatible with a potential regulatory unit involving chromatin structure (associated with ATAA), Rel/NFκB transcription factors and NFκB response elements. Other regulatory codes could exist involving anyone of these components, in addition to others.
The role of chromatin structure in gene expression regulation during immune response of insects remains poorly explored. This work provides a first insight into this complex regulatory mechanism potentially shared by immune genes of drosophilidae and culicidae.
Immunity genes of A. gambiae, Ae. aegypti, D. melanogaster and many other Drosophilid species share a common enrichment of AT-rich motifs in their 5'-US regions. AT-rich motifs are frequently associated to bioinformatic nucleosome positioning predictions, suggesting their participation in a particular nucleosome organization involved in transcriptional regulation of an immunity co-regulated module. Many of these regulatory regions also have NFκB response elements within the first 500 bp 5' from the transcription start site. These two features suggest that the mechanism of transcriptional regulation of immune response genes in dipterans are conserved and might occur through modifications in chromatin structure of promoter regions mediated by NFκB-dependent recruitment of remodeling factors. Our findings suggest that AT-rich motif enrichment in regulatory regions in this group of co-regulated genes could represent an evolutionary constrained signature in dipterans and perhaps other species, despite their evolutionary distance.
Gene selection criteria
Microarray data for A. gambiae immune response was kindly provided by the author . Microarray data from D. melanogaster immune response  was downloaded from . Analysis of expression profiles was conducted using the TMEV version 3.1 module of TM4 microarray software suite .
Three gene clusters were selected for each species by hierarchical clustering  from the microarray databases. For A. gambiae, gene selection criteria were as follows: Induced immunity-related genes: genes associated with immunity based on the protein structural features, and with expression values in log2(f2/f1) > 0 in at least 7/10 immunological challenges and 3/6 points for each challenge. Non-modified genes: genes with an expression mean of log2(f2/f1) ± 0.10 and a standard deviation of ± 0.15 in 9/10 immunological challenges. Down-regulated genes: genes with an expression level of log2(f2/f1) ≤ -0.5855 (a repression level of at least 1.5 times with respect to control cells) in at least 4/10 immune challenges and in 4/6 times for each challenge, and with a maximal expression level of log2(f2/f1) < 0.6785 in only one point per challenge (a maximal expression level less than 1.6 times with respect to control cells in only one point). For D. melanogaster, gene definitions were as follows: Immunity related genes: genes associated with immunity based on Gene Onthology classification (GO:0006952, defense response, biological process), and with expression values in log2(f2/f1) > 0 in 6/6 time points of bacterial challenge. Non-modified genes: genes with an expression mean of log2(f2/f1) ± 0.1 and a standard deviation of ± 0.07 in 6/6 time points of bacterial challenge. Down-regulated genes: genes with an expression level of log2(f2/f1) ≤ -0.5855 (a repression level of at least 1.5 times with respect to control) in 6/6 times of the bacterial challenge. Two additional groups were included in the analysis: Random genes: two groups of 20 and 30 genes were randomly selected from the A. gambiae and D. melanogaster genomes, respectively, using the "Random Gene Selection" tool of RSA-Tools . Random sequences (artificial): two groups of 20 and 30 random non-biological sequences were generated using the "Random DNA sequence" tool of Sequence Manipulation Suite, version 2 , using this tool we generated random sequences with equal proportions of each nucleotide (~0.25).
Using these definitions, for A. gambiae, we first selected the expression profiles and the associated gene was then identified. To verify the annotated transcription initiation site, each gene prediction was manually curated by two approaches: The first was by aligning corresponding EST clusters obtained from AnoEST  and UNIGENE  to the A. gambiae genome (AgamP3, Ensembl release 45, Jun 2007) using BLAST in the ENSEMBL genome browser . The second was based on manual verification of the presence of either TATA-box, Initiator sequence (Inr) or downstream promoter element (DPE) . For Drosophila, the gene ID was included in the microarray database.
Once the gene associated with each profile was identified, the 5' regulatory regions were recovered for D. melanogaster (BDGP4.3) and A. gambiae (AgamP3, Ensembl release 45, Jun 2007) genes using Ensembl's data mining tool Biomart .
An additional set of 5' upstream 2000 bp sequences from the 12 Drosophila species derived from the 12 Drosophila genome project  retrieved from , release R1.1 for D. virilis (23 sequences), R1.2 for D. ananassae (23 sequences), D. erecta (27 sequences), D. grimshawi (21 sequences), D. mojavensis (21 sequences), D. persimilis (20 sequences), D. sechellia (28 sequences), D. simulans (29 sequences), D. yakuba (27 sequences) and D. willistoni (24 sequences), R2.2 for D. pseudooscura (27 sequences) and R5.7 for D. melanogaster (35 sequences) and 15 Aedes aegypti  2500 bp sequences retrieved from Biomart  (AAEGL1) were also included for motif over-representation analysis. Selection of the Drosophila sequences was done based on orthology to D. melanogaster immunity gene data set described in Table 16, according to FlyBase annotations, but not expression data. For Ae. aegypti, genes were also selected based on one to one orthology to immunity genes in A. gambiae according to Table 17.
Statistically overrepresented DNA motifs
To identify statistically overrepresented DNA motifs in 5' DNA regulatory regions of selected genes, we used the Oligo-Analysis program  searching for DNA motifs of 2 to 8 nucleotides of length in 5' upstream regions of 2500 or 2000 nucleotides. For the analysis, we created our own expected frequency tables for each motif length, using 5' upstream regions of 2500 nucleotides length corresponding to 13,172 genes of D. melanogaster; 13,166 genes of A. gambiae, and 16,691 in Ae. aegypti. A similar approach was used for the 11 additional Drosophila species using pre-computed 2000 bp upstream 5' sequences. The obtained expected frequency tables were used to estimate the expected number of occurrences for each oligonucleotide in induced, down-regulated, non-modified, random biological and random no-biological sets of sequences. The analyzed sequences were aligned to detect and avoid duplication between sequences, and duplicated regions larger than 40 nucleotides inside a sequence were removed. Also, to prevent a bias due to self-overlapping, a non-overlapping mode was adopted. The detection of overrepresented oligonucleotides was based on an estimation of the significance of the observed occurrences (Oocc). For each oligonucleotide, the p value (Pocc) was calculated on the basis of the binomial distribution. Because the analysis comprise multiple tests (256 in the case of tetranucleotides), the possibility exists that even low p values appeared by chance. To correct for such a multitesting effect, the p values were multiplied by the number of oligonucleotides. This correction results in an expected value (Eocc). The significance index [sigocc = -log(Eocc)] reflects the degree of overrepresentation for each oligonucleotide in a logarithmic scale .
The motifs overrepresentation identified with Oligo-Analysis, was verified using POBO , which uses bootstrap to verify the statistical overrepresentation of a given motif.
Nucleosome positioning prediction
To predict regions of nucleosomal occupancy in the sequences of the distinct groups of A. gambiae and D. melanogaster 5'-US, we used two programs: RECON , which assigns a nucleosomal potential value at each position in a sequence using sliding windows of 160 pb based on statistical distribution of dinucleotide frequencies, and Nucleosome Position Prediction , which predicts nucleosomal positioning using probabilistic and thermodynamic models, assigning a p value of nucleosomal occupancy (pNO) to each position of a sequence. For this last program, we analyzed 5'-US using yeast, chicken and human models, and both published and working versions of the program. The length of the 5'-US analyzed was of 2500 bp for A. gambiae, D. melanogaster and Ae. aegypti and 2000 bp for non-melanogaster Drosophila species. For RECON, we used 2660 bp that comprised the 2500 pb promoter, flanked by 80 bp, in order to recover nucleosomal potential values for all promoter positions.
Analysis of motifs position regarding predicted nucleosomal regions
To associate the results obtained with the program Oligo-Analysis and those obtained with RECON and Nucleosome Position Prediction, perl scripts were written to automatically associate motifs coordinates (obtained with DNA-Pattern,  with tables containing "Nucleosomal potential" values (obtained with RECON) or "p of nucleosomal occupancy (pNO)" values (obtained with Nucleosome Position Prediction).
To determine if there was a correlation between the findings of Oligo-analysis and RECON, using perl scripts, the corresponding value of nucleosomal potential obtained with RECON was assigned to each position of the ATAA motif, having four values for each motif, on basis to these four values each ATAA motif was classified as: 1) positive, if the four positions of ATAA were positive; 2) negative, if the four positions were negative; 3) mixed, if at least one position was of opposite sign to the others, and 4) undefined, if at least one position was an N. Once classified, the distribution of ATAA motifs between these four categories was statistically evaluated.
Similarly, to determine if there was a correlation between the findings of Oligo-analysis and Nucleosome Position Prediction, using perl scripts, the corresponding pNO value obtained with the algorithm "Nucleosome Position Prediction" was assigned to each position of the ATAA motif, having again four values for each motif, one per each position. Based on these four values, each ATAA motif was classified as: 1) high p value motif if the four positions had occupancy p values higher than 0.8, 2) medium p value motif if the four positions had occupancy p values between 0.8 an 0.5, 3) low p value motif if the four positions had occupancy p values below of 0.5, and 4) undefined if at least one value were not belonging to the same range of values. Additionally, given that this software uses yeast, chicken and human models, and have a working and a published version, data generated with each model and each version were analyzed. This program does not accept sequences with Ns, therefore, in some cases the number of sequences analyzed by group of 5' upstream sequences was slightly smaller. For A. gambiae: 16 immunity, 12 down-regulated, 17 random genes and 17 non-modified. For D. melanogaster, only the group of non-modified genes was modified, from 32 to 31 sequences. For the other Drosophila species and Ae. aegypti the number of analysed sequences was the same for both programs.
Analysis of 5'-US with alignment matrices
Using alignment matrices constructed on NFκB REs identified in immunity 5'-US of A. gambiae and D. melanogaster by MEME  (Figure 12), a matrix-based search was carried out in the different groups of genes using the PATSER algorithm , searching for NFκB REs. A lower threshold estimation of 5.0 was assigned.
In order to compare the distribution of AT-rich motifs associated with positive nucleosomal potential values among groups of genes, we fitted ordinary least squares regression models with robust standard errors, with the number of positive AT-rich motifs as the dependent variable and dummy variables of the corresponding group of genes as predictors for all dipteran species. We fitted similar models to compare among groups of genes the distribution of AT-rich motifs associated with pNO > 0.8, and the AT% difference between groups of genes. Additionally, we compared the number of AT-rich motifs within all types of nucleotide sequences, either associated with positive vs. negative values of nucleosomal potential or pNO > 0.8 vs. pNO < 0.5, by use of paired Student's T tests.
To evaluate the distribution of NFκB REs throughout the promoter regions, comparisons were done using Poisson regression with the count of NFκB RE as response variable and the group of genes as independent variable. The models support the χ2 goodness-of-fit test, when the model was not supported, a Kruskal-Wallis non-parametric test was done. Ordinary least squares regression analysis was performed to compare the counts of ATAA in the vicinity (± 200 bp) of NFκB motifs by type of gene group (immunity, non-modified and down-regulated), in both A. gambiae and D. melanogaster.
Carroll SB: Genetics and the making of Homo sapiens. Nature. 2003, 422: 849-857. 10.1038/nature01495.
Levine M, Tjian R: Transcription regulation and animal diversity. Nature. 2003, 424: 147-151. 10.1038/nature01763.
Davidson EH: The regulatory genome. 2006, Canada: Academic Press
Khorasanizadeh S: The nucleosome: from genomic organization to genomic regulation. Cell. 2004, 116: 259-272. 10.1016/S0092-8674(04)00044-3.
Kornberg RD: Chromatin structure: a repeating unit of histones and DNA. Science. 1974, 184: 868-871. 10.1126/science.184.4139.868.
Wu C: Chromatin remodelling and the control of gene expression. J Biol Chem. 1997, 272: 28171-28174. 10.1074/jbc.272.45.28171.
Narlikar GJ, Fan HY, Kingston RE: Cooperation between complexes that regulate chromatin structure and transcription. Cell. 2002, 108: 475-487. 10.1016/S0092-8674(02)00654-2.
Smale ST, Fisher AG: Chromatin structure and gene regulation in the immune system. Ann Rev Immunol. 2002, 20: 427-462. 10.1146/annurev.immunol.20.100301.064739.
Schulze SR, Wallrath LL: Gene regulation by chromatin structure: paradigms established in Drosophila melanogaster. Annu Rev Entomol. 2007, 52: 171-192. 10.1146/annurev.ento.51.110104.151007.
Schmid A, Fascher KD, Hörz W: Nucleosome disruption at the yeast PHO5 promoter upon PHO5 induction occurs in the absence of DNA replication. Cell. 1992, 71: 853-864. 10.1016/0092-8674(92)90560-Y.
Lewin B: Chromatin and gene expression: constant questions, but changing answers. Cell. 1994, 79: 397-406. 10.1016/0092-8674(94)90249-6.
Beato M, Eisfeld K: Transcription factor access to chromatin. Nucleic Acids Res. 1997, 25: 3559-3563. 10.1093/nar/25.18.3559.
Jackson JR, Benyajati C: DNA-histone interactions are sufficient to position a single nucleosome juxtaposing Drosophila Adh adult enhancer and distal promoter. Nucleic Acids Res. 1993, 21: 957-967. 10.1093/nar/21.4.957.
Trifonov EN, Sussman JL: The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc Natl Acad Sci. 1980, 77: 3816-3820. 10.1073/pnas.77.7.3816.
Satchwell SC, Drew HR, Travers AA: Sequence periodicities in chicken nucleosome core DNA. J Mol Biol. 1986, 191: 659-675. 10.1016/0022-2836(86)90452-3.
Shrader TE, Crothers DM: Artificial nucleosome positioning sequences. Proc Natl Acad Sci. 1989, 86: 7418-7422. 10.1073/pnas.86.19.7418.
Ioshikhes I, Bolshoy A, Derenshteyn K, Borodovsky M, Trifonov EN: Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences. J Mol Biol. 1996, 262: 129-139. 10.1006/jmbi.1996.0503.
Widlund HR, Cao H, Simonsson S, Magnusson E, Simonsson T, Nielsen PE, Kahn JD, Crothers DM, Kubista DM: Identification and characterization of genomic nucleosome-positioning sequences. J Mol Biol. 1997, 267: 807-817. 10.1006/jmbi.1997.0916.
Stein A, Bina M: A signal encoded in vertebrate DNA that influences nucleosome positioning and aligment. Nucleic Acids Res. 1999, 27: 848-853. 10.1093/nar/27.3.848.
Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JZ, Widom J: A genomic code for nucleosome positioning. Nature. 2006, 442: 772-778. 10.1038/nature04979.
Johnson SM, Tan FJ, McCullough HL, Riordan DP, Fire AZ: Flexibility and constraint in the nucleosome core landscape of Caenorhabditis elegans chromatin. Genome Res. 2006, 16: 1505-1516. 10.1101/gr.5560806.
Levitsky VG: RECON: a program for prediction of nucleosome formation potential. Nuc Acids Res. 2004, 32: W346-W349. 10.1093/nar/gkh482.
Kimbrell DA, Beutler B: The evolution and genetics of innate immunity. Nat Rev Genet. 2001, 2: 256-267. 10.1038/35066006.
Hernández-Romano J, Martínez-Barnetche J, Rodríguez-López MH: Transcriptional regulation of immune related genes in insects: Insights into the genomics of Anopheles immune response. Genes, Genomes and Genomics. Edited by: Thangadurai D, Tang W, Pullaiah T. 2006, New Delhi: Regency Publications, 1: 98-130.
Ferrandon D, Imler JL, Hetru C, Hoffmann JA: The Drosophila systemic immune response: sensing and signalling during bacterial and fungal infections. Nat Rev Immunol. 2007, 7: 862-874. 10.1038/nri2194.
Dimopoulos G, Christophides GK, Meister S, Schultz J, White KP, Barillas-Mury C, Kafatos FC: Genome expression analysis of Anopheles gambiae : responses to injury, bacterial challenge, and malaria infection. Proc Natl Acad Sci. 2002, 99: 8814-8819. 10.1073/pnas.092274999.
De Gregorio E, Spellman PT, Rubin GM, Lemaitre B: Genome-wide analysis of the Drosophila immune response by using oligonucleotide microarrays. Proc Natl Acad Sci. 2001, 98: 12590-12595. 10.1073/pnas.221458698.
Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Graf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Hubbard TJ: Ensembl 2006. Nucleic Acids Res. 2006, 34: D556-561. 10.1093/nar/gkj133.
van Helden, André B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998, 281: 827-842. 10.1006/jmbi.1998.1947.
Hulzink RJ, Weerdesteyn H, Croes AF, Gerats T, van Herpen MM, van Helden J: In silico identification of putative regulatory sequence elements in the 5'-untranslated region of genes that are expressed during male gametogenesis. Plant Physiol. 2003, 132: 75-83. 10.1104/pp.102.014894.
Kankainen M, Holm L: POBO, transcription binding site verification with bootstrapping. Nucleic Acid Res. 2004, 32: W222-W229. 10.1093/nar/gkh463.
Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, Bernardo de Carvalho A, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S, Jeck WR, Johnson J, Jones CD, Jordan WC, Karpen GH, Kataoka E, Keightley PD, Kheradpour P, Kirkness EF, Koerich LB, Kristiansen K, Kudrna D, Kulathinal RJ, Kumar S, Kwok R, Lander E, Langley CH, Lapoint R, Lazzaro BP, Lee SJ, Levesque L, Li R, Lin CF, Lin MF, Lindblad-Toh K, Llopart A, Long M, Low L, Lozovsky E, Lu J, Luo M, Machado CA, Makalowski W, Marzo M, Matsuda M, Matzkin L, McAllister B, McBride CS, McKernan B, McKernan K, Mendez-Lago M, Minx P, Mollenhauer MU, Montooth K, Mount SM, Mu X, Myers E, Negre B, Newfeld S, Nielsen R, Noor MA, O'Grady P, Pachter L, Papaceit M, Parisi MJ, Parisi M, Parts L, Pedersen JS, Pesole G, Phillippy AM, Ponting CP, Pop M, Porcelli D, Powell JR, Prohaska S, Pruitt K, Puig M, Quesneville H, Ram KR, Rand D, Rasmussen MD, Reed LK, Reenan R, Reily A, Remington KA, Rieger TT, Ritchie MG, Robin C, Rogers YH, Rohde C, Rozas J, Rubenfield MJ, Ruiz A, Russo S, Salzberg SL, Sanchez-Gracia A, Saranga DJ, Sato H, Schaeffer SW, Schatz MC, Schlenke T, Schwartz R, Segarra C, Singh RS, Sirot L, Sirota M, Sisneros NB, Smith CD, Smith TF, Spieth J, Stage DE, Stark A, Stephan W, Strausberg RL, Strempel S, Sturgill D, Sutton G, Sutton GG, Tao W, Teichmann S, Tobari YN, Tomimura Y, Tsolas JM, Valente VL, Venter E, Venter JC, Vicario S, Vieira FG, Vilella AJ, Villasante A, Walenz B, Wang J, Wasserman M, Watts T, Wilson D, Wilson RK, Wing RA, Wolfner MF, Wong A, Wong GK, Wu CI, Wu G, Yamamoto D, Yang HP, Yang SP, Yorke JA, Yoshida K, Zdobnov E, Zhang P, Zhang Y, Zimin AV, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, Grabherr M, Kleber M, Mauceli E, MacCallum I: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 8: 203-218.
Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP, Couronne O, Hua S, Smith MA, Zhang P, Liu J, Bussemaker HJ, van Batenburg MF, Howells SL, Scherer SE, Sodergren E, Matthews BB, Crosby MA, Schroeder AJ, Ortiz-Barrientos D, Rives CM, Metzker ML, Muzny DM, Scott G, Steffen D, Wheeler DA, Worley KC, Havlak P, Durbin KJ, Egan A, Gill R, Hume J, Morgan MB, Miner G, Hamilton C, Huang Y, Waldron L, Verduzco D, Clerc-Blankenburg KP, Dubchak I, Noor MA, Anderson W, White KP, Clark AG, Schaeffer SW, Gelbart W, Weinstock GM, Gibbs RA: Comparative genome sequencing of Drosophila pseudoobscura : chromosomal, gene, and cis-element evolution. Genome Res. 2005, 15: 1-18. 10.1101/gr.3059305.
Levitsky VG, Podkolodnaya OA, Kolchanov NA, Podkolodny NL: Nucleosome formation potential of eukaryotic DNA: calculation and promoters analysis. Bioinformatics. 2001, 17: 998-1010. 10.1093/bioinformatics/17.11.998.
Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proceedings of the third international conference on intelligent systems for molecular biology. 1995, AAAI Press, Menlo Park, CA, 21-29.
Zdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM, Mueller HM, Dimopoulos G, Law JH, Wells MA, Birney E, Charlab R, Halpern AL, Kokoza E, Kraft CL, Lai Z, Lewis S, Louis C, Barillas-Mury C, Nusskern D, Rubin GM, Salzberg SL, Sutton GG, Topalis P, Wides R, Wincker P, Yandell M, Collins FH, Ribeiro J, Gelbart WM, Kafatos FC, Bork P: Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002, 298: 149-159. 10.1126/science.1077061.
Widom J: Role of DNA sequence in nucleosome stability and dynamics. Q Rev Biophys. 2001, 34: 269-324.
Anderson JD, Widom J: Poly(dA-dT) promoter elements increase the equilibrium accessibility of nucleosomal DNA target sites. Mol Cell Biol. 2001, 21: 3830-3839. 10.1128/MCB.21.11.3830-3839.2001.
Sekinger EA, Moqtaderi Z, Struhl K: Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast. Mol Cell. 2005, 18: 735-748. 10.1016/j.molcel.2005.05.003.
Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C: A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet. 2007, 39: 1235-1244. 10.1038/ng2117.
Snel B, van Noort V, Huynen MA: Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. Nucleic Acids Res. 2004, 32: 4725-4731. 10.1093/nar/gkh815.
De Gregorio E, Spellman PT, Tzou P, Rubin GM, Lemaitre B: The Toll and Imd pathways are the major regulators of the immune system response in Drosophila. EMBO J. 2002, 21: 2568-2579. 10.1093/emboj/21.11.2568.
Waterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, Barillas-Mury C, Bian G, Blandin S, Christensen BM, Dong Y, Jiang H, Kanost MR, Koutsos AC, Levashina EA, Li J, Ligoxygakis P, Maccallum RM, Mayhew GF, Mendes A, Michel K, Osta MA, Paskewitz S, Shin SW, Vlachou D, Wang L, Wei W, Zheng L, Zou Z, Severson DW, Raikhel AS, Kafatos FC, Dimopoulos G, Zdobnov EM, Christophides GK: Evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes. Science. 2007, 316: 1738-1743. 10.1126/science.1139862.
Hoffmann JA, Kafatos FC, Janeway CA, Ezekowitz RAB: Phylogenetic perspectives in innate immunity. Science. 1999, 284: 1313-1318. 10.1126/science.284.5418.1313.
Hoffmann JA, Reichhart JM: Drosophila innate immunity: an evolutionary perspective. Nat Immunol. 2002, 3: 121-126. 10.1038/ni0202-121.
Zheng L, Zhang L, Lin H, McIntosh MT, Malacrida AR: Toll-like receptors in invertebrate innate immunity. ISJ. 2005, 2: 105-113.
Dziarski R, Gupta D: Mammalian PGRPs: novel antibacterial proteins. Cell Microbiol. 2006, 8: 1059-1069. 10.1111/j.1462-5822.2006.00726.x.
Sun SC, Lindstrom I, Lee JY, Faye I: Structure and expression of the attacin genes in Hyalophora cecropia. Eur J Biochem. 1991, 196: 247-254. 10.1111/j.1432-1033.1991.tb15811.x.
Kadalayil L, Petersen U, Engström Y: Adjacent GATA and κB-like motifs regulate the expression of a Drosophila immune gene. Nuc Acids Res. 1997, 25: 1233-1239. 10.1093/nar/25.6.1233.
Eggleston P, Lu W, Zhao Y: Genomic organization and immune regulation of the defensin gene from the mosquito, Anopheles gambiae. Insect Mol Biol. 2000, 9: 481-490. 10.1046/j.1365-2583.2000.00212.x.
Zheng XL, Zheng AL: Genomic organization and regulation of three cecropin genes in Anopheles gambiae. Insect Mol Biol. 2002, 11: 517-525. 10.1046/j.1365-2583.2002.00360.x.
Senger K, Armstrong GW, Rowell WJ, Kwan JM, Markstein M, Levine M: Immunity regulatory DNAs share common organizational features in Drosophila. Mol Cell. 2004, 13: 19-32. 10.1016/S1097-2765(03)00500-8.
Angelov D, Lenouvel F, Hans F, Müller CW, Bouvet P, Bednar J, Moudrianakis EN, Cadet J, Dimitrov S: The histone octamer is invisible when NF-kappaB binds to the nucleosome. J Biol Chem. 2004, 279: 42374-42382. 10.1074/jbc.M407235200.
Hoberg JE, Yeung F, Mayo MW: SMRT derepression by the IkappaB kinase alpha: a prerequisite to NF-kappaB transcription and survival. Mol Cell. 2004, 16: 245-255. 10.1016/j.molcel.2004.10.010.
Murphy KM, Reiner SL: The lineage decisions of helper T cells. Nat Rev Immunol. 2002, 2: 933-944. 10.1038/nri954.
Weinmann AS, Plevy SE, Smale ST: Rapid and selective remodelling of a positioned nucleosome during the induction of IL-12 p40 transcription. Immunity. 1999, 11: 665-675. 10.1016/S1074-7613(00)80141-7.
Agalioti T, Lomvardas S, Parekh B, Yie J, Maniatis T, Thanos D: Ordered recruitment of chromatin modifying and general transcription factors to the IFN-β promoter. Cell. 2000, 103: 667-678. 10.1016/S0092-8674(00)00169-0.
Attema JL, Reeves R, Murray V, Levichkin I, Temple MD, Tremethick DJ, Shannon MF: The human IL-2 promoter can assemble a positioned nucleosome that becomes remodeled upon T cell activation. J Immunol. 2002, 169: 2466-2476.
Holloway AF, Rao S, Chen X, Shannon MF: Changes in chromatin accessibility across the GM-CSF promoter upon T cell activation are dependent on nuclear factor kappaB proteins. J Exp Med. 2003, 197: 413-23. 10.1084/jem.20021039.
Strahl BD, Allis CD: The language of covalent histone modifications. Nature. 2000, 403: 41-45. 10.1038/47412.
Saha A, Wittmeyer J, Cairns BR: Chromatin remodelling: the industrial revolution of DNA around histones. Nature Rev Mol Cell Biol. 2006, 7: 437-447. 10.1038/nrm1945.
Genome-wide Gene Expression Patterns of Drosophila in Response to Immune Challenge. [http://www.fruitfly.org/expression/immunity/data.shtml]
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003, 34: 374-378.
Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.
Regulatory Sequence Analysis Tools. [http://rsat.scmbb.ulb.ac.be/rsat/]
Sequence Manipulation Suite. [http://bioinformatics.org/sms2/random_dna.html]
Kriventseva EV, Koutsos AC, Blass C, Kafatos FC, Christophides GK, Zdobnov EM: AnoEST: toward A. gambiae functional genomics. Genome Res. 2005, 15: 893-899. 10.1101/gr.3756405.
Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003, 31: 28-33. 10.1093/nar/gkg033.
Burke TW, Kadonaga JT: The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev. 1997, 11: 3020-3031. 10.1101/gad.11.22.3020.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005, 21: 3439-3440. 10.1093/bioinformatics/bti525.
Flybase: a database of Drosophila genes and genomes. [http://www.flybase.org]
Nene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, Loftus B, Xi Z, Megy K, Grabherr M, Ren Q, Zdobnov EM, Lobo NF, Campbell KS, Brown SE, Bonaldo MF, Zhu J, Sinkins SP, Hogenkamp DG, Amedeo P, Arensburger P, Atkinson PW, Bidwell S, Biedler J, Birney E, Bruggner RV, Costas J, Coy MR, Crabtree J, Crawford M, Debruyn B, Decaprio D, Eiglmeier K, Eisenstadt E, El-Dorry H, Gelbart WM, Gomes SL, Hammond M, Hannick LI, Hogan JR, Holmes MH, Jaffe D, Johnston JS, Kennedy RC, Koo H, Kravitz S, Kriventseva EV, Kulp D, Labutti K, Lee E, Li S, Lovin DD, Mao C, Mauceli E, Menck CF, Miller JR, Montgomery P, Mori A, Nascimento AL, Naveira HF, Nusbaum C, O'leary S, Orvis J, Pertea M, Quesneville H, Reidenbach KR, Rogers YH, Roth CW, Schneider JR, Schatz M, Shumway M, Stanke M, Stinson EO, Tubio JM, Vanzee JP, Verjovski-Almeida S, Werner D, White O, Wyder S, Zeng Q, Zhao Q, Zhao Y, Hill CA, Raikhel AS, Soares MB, Knudson DL, Lee NH, Galagan J, Salzberg SL, Paulsen IT, Dimopoulos G, Collins FH, Birren B, Fraser-Liggett CM, Severson DW: Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007, 316: 1718-1723. 10.1126/science.1138878.
Hertz GZ, Stormo GD: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999, 15: 563-577. 10.1093/bioinformatics/15.7.563.
Conacyt PhD scholarship 159376 to JHR. SEP-Conacyt P44797 Young Researcher award to JMB. Maritza Solano for technical assistance; Mara Tellez Rojo, Julio Collado Vides, Mario Zurita and Jaques van Helden and the two reviewers for helpful discussions and suggestions.
JH–R participated in the design, data acquisition, data analysis and interpretation and wrote the manuscript. HS, FJC–R wrote perl scripts for data acquisition. HL–F performed statistical analysis. VV–G participated in data interpretation as well as revising the manuscript. MHR participated in conception, data analysis and interpretation as well as revising the manuscript. JM–B participated in conception and design, data acquisition and analysis, interpretation and drafted the manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Hernandez-Romano, J., Carlos-Rivera, F.J., Salgado, H. et al. Immunity related genes in dipterans share common enrichment of AT-rich motifs in their 5' regulatory regions that are potentially involved in nucleosome formation. BMC Genomics 9, 326 (2008). https://doi.org/10.1186/1471-2164-9-326