In this study, we used a random sampling procedure as a general method to obtain reliable control datasets in the analysis of high-throughput genomic assays. We find that datasets of more than 100 individual values can be used without decreasing the robustness of statistical analysis, and that independently generated random subsets of data have statistically indistinguishable global properties. Thus, subsampling can provide a convenient way to display and compare the noise and signals from experimental and control datasets of the same size.
First, we showed that NFI binds preferentially those predicted sites that are located upstream of the initiation sites of transcription (Figure 1). Several interpretations may be given to the preferential association of NFI to binding sites in the proximity of TSS rather than to other locations of the genome. It is known that NFI occupies the promoters of many genes where it may bind synergistically with some other transcription factors such as hepatocyte nuclear factor 1 alpha, estrogen receptor, Brg-associated factor [31–33]. Thus, the preferred occupancy of TSS proximal sites may at least in part reflect the synergistic association of NFI with other factors.
We also found that NFI occupy promoters or upstream regions of the group of genes that are significantly more expressed than the representative randomly selected control groups. Since correlation does not necessarily imply causal relationship, this observation does not allow the conclusion that NFI-family members actually activate the expression of these genes. For instance, NFI might bind highly expressed genes to suppress in part their expression, but still leaving relatively high transcription levels. However, taken together with previous observations that NFI activates the expression of many genes in higher eukaryotes [20, 21, 32, 34–37], we rather conclude that the observed correlation may originate from a direct up-regulation of gene expression by NFI, at least for a significant proportion of its target genes.
The hypothesis that NFI family members may directly activate genes appears to be true for at least one of the member of the family (i.e. NFI-C), as mRNA profiling analysis performed on wild-type and NFI-C knock-out cells revealed that NFI-C is a more potent gene activator than a repressor. The 1000 genes that are most up-regulated by NFI-C had significantly higher change in their expression levels than the top 1000 down-regulated genes. In addition, up-regulated genes showed significantly higher expression levels than representative control gene samples selected from the total gene population, implicating again that this factor is a potent activator of gene expression. Since the selected in vivo NFI binding sites are located up to 5 kb from their TSS, which is a relatively large distance, NFI might act as well through some of the types of remote regulation, for instance by the establishment of a chromatin domain boundary that would prevent the propagation of a silencing chromatin structure towards the promoter [27, 38].
Histone H3 methylations such as the H3K4me3 and H3K36me3 modifications were found to be enriched around the TSS of NFI-occupied genes when compared with control gene groups. This finding is consistent with the model that NFI acts predominantly as an activator of transcription, since H3K4me3 and H3K36me3, but not H3K27me3, were proposed as markers of active gene transcription [4, 6, 39]. This indicates that NFI binding to the upstream regions may contribute to the recruitment of the specific enzymes for the H3K4me3 and H3K36me3 modifications. A genome-wide correlation of the occurrence of H3K27me3 was also observed around TSS occurring close to NFI-bound sites, however it was indistinguishable to that of the control group of genes. This indicates that this correlation results from an enrichment of H3K27me3 around at least some of the TSS, and that NFI is not involved in the recruitment of enzymes mediating this modification. Thus, the enrichment of H3K27me3 modification over the NFI bound genes represents a false positive genome-wide correlation. Interestingly, we also found the H3K9me3 modification to be slightly enriched in the group of NFI bound genes. Although H3K9me3 has been associated with a closed chromatin structure, this suggests that NFI may be involved in the recruitment of enzymes that mediate this modification. Interestingly, this modification was recently associated with a chromatin domain boundary effect at telomeric regions in human cells . In this study, NFI was shown to prevent the propagation of a silencing chromatin structure from the telomere, and the expressed genes protected from telomeric silencing by NFI were shown to have elevated H3K9me3 marks at specific telomeric positions. Thus, we may conclude from these studies that the enrichment in H3K9me3 may be a hallmark of gene expression activation by NFI.