A gene-rich, transcriptionally active environment and the pre-deposition of repressive marks are predictive of susceptibility to KRAB/KAP1-mediated silencing

Background KRAB-ZFPs (Krüppel-associated box domain-zinc finger proteins) are vertebrate-restricted transcriptional repressors encoded in the hundreds by the mouse and human genomes. They act via an essential cofactor, KAP1, which recruits effectors responsible for the formation of facultative heterochromatin. We have recently shown that KRAB/KAP1 can mediate long-range transcriptional repression through heterochromatin spreading, but also demonstrated that this process is at times countered by endogenous influences. Method To investigate this issue further we used an ectopic KRAB-based repressor. This system allowed us to tether KRAB/KAP1 to hundreds of euchromatic sites within genes, and to record its impact on gene expression. We then correlated this KRAB/KAP1-mediated transcriptional effect to pre-existing genomic and chromatin structures to identify specific characteristics making a gene susceptible to repression. Results We found that genes that were susceptible to KRAB/KAP1-mediated silencing carried higher levels of repressive histone marks both at the promoter and over the transcribed region than genes that were insensitive. In parallel, we found a high enrichment in euchromatic marks within both the close and more distant environment of these genes. Conclusion Together, these data indicate that high levels of gene activity in the genomic environment and the pre-deposition of repressive histone marks within a gene increase its susceptibility to KRAB/KAP1-mediated repression.


Background
Gene expression is modulated through the alteration of chromatin states by epigenetic regulators. Krüppel-associated box zinc finger proteins (KRAB-ZFPs), which together constitute the single largest group of transcriptional repressors encoded by the human genome, partake in this process [1][2][3]. The KRAB-ZFP family is evolutionary recent and has expanded and diverged through multiple rounds of gene and segment duplications, to give rise to more than three hundred and fifty annotated members in humans [4][5][6][7]. Despite their abundance, KRAB-ZFPs and their transcriptional targets remain largely uncharacterized except for a few [8][9][10]. KRAB-ZFPs carry a C-terminal array of two to forty C2H2 zinc finger motifs, each potentially capable of recognizing a triplet of nucleotides in a sequence-specific manner [1], while their N-terminal KRAB domain 1 School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland Full list of author information is available at the end of the article recruits the KAP1 (KRAB associated protein 1) corepressor [11][12][13][14]. KAP1 (also named TIF1β, KRIP-1 or TRIM28) binds KRAB and homotrimerizes through its N-terminal RBCC (Ring finger/B box/Coiled-Coil) domain, while its C-terminus acts as a scaffold for various heterochromatin-inducing factors, such as heterochromatin protein 1 (HP1), the histone methyltransferase ESET (also known as SetDB1), the nucleosome-remodeling and histone deacetylation (NuRD) complex, the nuclear receptor corepressor complex 1 (N-CoR1) and, at least during early embryonic development, de novo DNA methyltransferases [15][16][17][18][19][20][21][22]. This results in local loss of histone acetylation, enrichment in histone 3 lysine 9 trimethylation (H3K9me3) and increased chromatin compaction [23,24].
Using chromatin immunoprecipitation (ChIP) and a tiling array, KAP1 has been documented to bind more than 7000 sites in a human testicular embryonal carcinoma cell line [25]. A more recent publication additionally revealed that KAP1 chromatin targeting falls into different categories, only a subset of which is dependent on its RBCC domain and consequently on its association with KRAB-ZFPs [26]. KAP1 is dynamically associated with both heterochromatin and euchromatin. It is thought to organize constitutive heterochromatin and to stimulate its propagation, as evidenced by its co-localization with HP1 in pericentromeric heterochromatin domains [16,27]. Using a combination of gene trapping and a drug-controllable KRAB-containing repressor, we recently demonstrated that KRAB/KAP1 can induce long-range repression through HP1-dependent heterochromatin spreading [28]. However, while some promoters located tens of kilobases (kb) from KAP1 docking sites were silenced by this mechanism, others were resistant. Here, we investigated the basis for this differential behavior by comparing the genomic context and the pre-existing levels of specific chromatin marks at repressed and non-repressed genes. This analysis revealed that genes most susceptible to KRAB/KAP1induced silencing were in genomic regions of high gene activity. More specifically, repression was most efficient at sites with increased levels of pre-existing repressive histone marks at promoters and gene bodies, embedded within gene-rich regions with high levels of transcription.

Characterization of thousands of KRAB/KAP1-targeted gene traps
To study the impact of specific genomic features on KRAB/KAP1-induced silencing, we used the recently described trapping/silencing (TrapSil) system [28]. Here, retrovirally-trapped cellular promoters are exposed to a drug-regulated KRAB-containing repressor. The tTRKRAB protein contains the KRAB domain of the human KOX1 ZFP fused to the E. coli tetracycline repressor (tTR), and binds to Tet operator sequences (TetO) in a doxycycline (Dox)-controlled manner [29,30] ( Figure 1A). We engineered retroviral-based gene trap vectors carrying tandem TetO repeats and a promoterless GFP-puromycin resistance fusion reporter. This design predicts that i) reporter expression occurs from the promoters of active genes targeted by the integrants ("trapping"), and ii) Dox withdrawal results in tTRKRAB binding to the TetO sites present in the provirus, thus exposing the trapped promoters to KRAB/KAP1mediated silencing ("silencing") ( Figure 1A). Using this experimental setup, we previously observed that while KRAB/KAP1 can act over long distances it is generally more effective when bound 20 kb or less from a promoter [28]. To study which other parameters might affect KRAB/KAP1-induced repression, we infected tTRKRABexpressing HeLa cells with low doses of retroviral-based TrapSil vectors to ensure that only one integrant per cell was present. We made use of a combination of murine leukemia viral (MLV)-and lentiviral (LV)-based TrapSil vectors to obtain a greater diversity of targeted genes, since MLV tends to integrate close to active transcriptional start sites (TSS), while LV hits genes further downstream in their transcribed region [31,32].
Since we were interested in elucidating differences between KRAB/KAP1 repressible and non-repressible promoters and genes, we reasoned that "all or none" phenotypes would facilitate subsequent analyses. Therefore, we selected cells in which trapped promoters were highly active at baseline, and either strongly repressed ("repressed clones" containing a "repressing integrant") or almost completely resistant to this process ("non-repressed clones" containing a "non-repressing integrant") when the trans-repressor was allowed to bind its target ( Figure 1B). More specifically, we isolated trapped integrants from a population of cells by puromycin selection in the presence of Dox, which impairs tTRKRAB binding and silencing. Then trapped integrants were subjected to subsequent rounds of cell sorting to isolate cells harboring gene traps with repressible promoters and reporter genes. These rounds first included the isolation of GFP negative cells when tTRKRAB was allowed to bind (Dox-), followed by the sorting out of GFP positive cells when its recruitment was inhibited (Dox+) ( Figure 1B). Isolation of non-repressible genes was achieved by a similar approach. However, trapped cell populations were cultured in the presence of tTRKRAB binding (Dox-) and GFP positive cells, which did not silence reporter expression, were directly isolated after TrapSil vector infections ( Figure 1B).
After the isolation of cell populations with differential silencing phenotypes, we mapped proviral integration sites, in order to identify the trapped genes. For this, we combined linker-mediated PCR (LM-PCR) of proviralgenomic junctions with massive parallel DNA pyrosequencing [31,33,34]. The amplified sites were mapped to the human genome with the FetchGWI software [35], and the UCSC known gene annotation was used to subsequently identify the trapped promoters ( Figure 1C). We previously described that about 1 in 15 promoters trapped by MLV-TrapSil vectors were nonrepressed by tTRKRAB, compared with approximately 1 in 5 for those captured by LV-based vectors [28]. Therefore, we isolated over 7000 integration sites, with an intentional bias for non-repressed clones to obtain integrant numbers comparable to their repressible counterparts. 69% of the promoter-trapping LV integrants mapped within annotated genes, whereas only 54% of their MLV counterparts did ( Figure 1C, Additional File 1). This observation is in agreement with previous data indicating that parental MLV as well as MLV-based gene traps integrate in promoter proximal regions, which are less well annotated than gene bodies, which in turn are the preferential integration sites of LV and LV-based traps [36,37]. Consistently, we mapped 6135 LV-TrapSil integrants to the genome, 4219 of which were located within genes. In contrast, we only found 787 intragenic MLV-TrapSil integrants.
Prior to further analysis, we validated our experimental approach by deriving clones from each population. All of the 32 clones analyzed exhibited the expected silencing profile in flow cytometry measurements. Moreover, the clones comprised 10 non-repressed (LI I-X) and 8 repressed (LR I-VIII) LV-TrapSil clones, in addition to 8 non-repressed (MI I-VIII) and 6 repressed (MR I-VI) MLV-TrapSil clones, (Additional File 2). We also used ChIP analysis to verify that non-repressed genes properly recruited KAP1 and downstream effectors to their tTRKRAB docking site, in a doxycyclinedependent manner (Additional File 3). After this validation, we continued with the characterization of the genomic context of our KRAB/KAP1 repressible or nonrepressible genes to find patterns correlating with silencing efficiency.

Genomic environment of repressing and non-repressing gene trap integrants
We characterized the genomic environment of the integrants segregated according to their phenotype by using ROC (Receiver Operator Characteristic) curve analysis [38]. This type of analysis was previously used to identify the genomic features enriched around retroviral integration sites. This study confirmed that both MLV and LV preferentially integrate within transcriptionally active regions, and that this effect is augmented when integrants enabling reporter expression are selected [38]. In addition, this analysis also revealed that the effects of different genomic features on integration can change depending on the size of genomic segments in question [38]. Therefore, we included genomic intervals ranging from 0.1 kb to 10 Mb in our analyses.
In order to characterize the genomic features surrounding the integrants in our different TrapSil groups, we made use of the same approach. We first calculated the area under the ROC curve, which is a common measure of a predictor variable's ability to discriminate between two classes of events. In our case we compared the average enrichment of a given feature at a set of genomic sites (such as integration sites in our case) relative to that of a set of random matched control sites. The read-out of this comparison is illustrated in colorcoded heatmaps, where each rectangle represents the specific enrichment of a feature within the indicated intervals of distance. The relative enrichment between the integration and control site group is scored on a scale from 0 to 1. 1 is scored when a specific feature is enriched in the experimental integrants when compared to matched controls, 0 is scored when the opposite is true. A value of 0.5 indicates no difference between the two groups. The patterns of the genomic features surrounding the TrapSil integrant groups largely reflected the preferential genomic environment associated with either LV or MLV integrations (for values see Additional File 4). It included a preference for both retroviruses for active genes, in addition to their differential targeting to gene bodies and promoters, respectively. This is reflected by an increased enrichment of CpG islands and DNase I sites at short intervals around MLV integrants when compared to LV integrants. This difference is lost when larger intervals are included in the ROC area calculation ( Figure 2). We then compared the ROC values, which are proportional to the levels of genomic features at these sites, between TrapSil groups harboring differential susceptibilities to KRAB/KAP1-silencing. We did this by making relative comparisons between a chosen reference and other gene groups. The reference gene groups are indicated by the symbol "-" within the whole results section. Using this approach we compared the levels of specific genomic features between the respective REP gene group and their corresponding NREP counterpart. When statistical differences were assessed, we found that LV-TrapSil repressing integrants were located within gene-denser genomic regions than non-repressible integrants ( Figure  2). Furthermore, the environment of repressing LV integrants was enriched in CpG and DNase I sites, as well as in highly expressed genes (based on publicly available microarray data), compared to that of non-repressing LV integrants. While all the described parameters were statistically significantly different between repressible and non-repressible LV traps, comparisons of their MLV-TrapSil counterparts did not reach significant differences, although it showed similar trends ( Figure 2). Therefore, a positive correlation between gene activity in the environment of the targeted transcriptional unit and efficient KRAB/KAP1-mediated silencing is established. The lack of significance between the MLV repressible and nonrepressible TrapSil groups could be due to smaller integrant numbers or could reflect the presence of other uncharacterized features affecting KRAB/KAP1 recruitment, including the on average closer proximity of MLV integrants to promoters.

Genomic features of matched repressed and nonrepressed transcriptional units
Repressing integrants were on average closer to the transcriptional start site of their targets, compared to This method serves to calculate the relative abundance of a given genomic feature around the integrants of a group for specific intervals. The resulting values are depicted in color-coded heat maps. 1 indicates that the specific feature is enriched in integrants, 0 means that it is depleted. Relative abundance scores of repressing and non-repressing integrants were compared for each trapping vector subtype and the statistical method used included the non-central chi-square test (** p < 0.01; *** p < 0.001). The different genomic feature categories tested were: "gene density", with all of the Refseq annotated genes; the "highly expressed genes" and "expressed genes" group, including genes expressed in the top 1/16 th , or the top 1/2 of all genes measured in a transcriptional profiling analysis; "start/end", including the distance to the nearest transcriptional start (TSS) or stop site; "gene start", including the distance to the nearest TSS; "gene size" was the average size of the targeted genes and was only analyzed for intragenic integrants; "GC content", included the density of GC nucleotides, which are more abundant in gene-rich regions; "CpG density", contained the frequency of CpG dinucleotides, mostly present at promoters; "DNAse HS sites", included the number of DNAse I hypersensitive sites, frequently associated with gene regulatory regions.
non-repressible integrants ( Figure 2). This finding is consistent with results from our previous analyses, which revealed that KRAB/KAP1-induced repression was more likely if gene traps were located closer to trapped promoters [28]. Therefore, the described integrant-centered analysis may suffer from potential biases linked to this spatial factor. We thus repeated our analyses focusing on genes that had a minimal size of 20 kb, a single known TSS, and were frequently targeted by our trapping vectors, that is, hit at least three times in our series. We then classified these genes into three subgroups according to their susceptibility to KRAB/ KAP1-induced silencing expressed as a function of the distance between KRAB-docking integrant and trapped promoter. This led to the identification of 70 genes that supported long-range repression, that is, for which most integrants located within 20 kb of the TSS were repressing (group 1); 77 genes supporting limited range repression, with silencing occurring mainly when integrants were located 10 kb or less from the TSS (group 2); and 80 genes resistant to repression, where no significant silencing occurred irrespective of the distance between the TSS and the KRAB/KAP1-docking sites (group 3) ( Figure 3A). Of note, there was no difference between the expression patterns of these genes in different tissues, indicating that these gene groups did not differ in being essential or not for cellular maintenance (data not shown).
The genomic context of the three gene groups was reminiscent of observations made in the integrant-centered analysis (Figure 2), with genes from group 1 being in gene-richer and transcriptionally more active environments, and surrounded by a higher density of DNase I hypersensitivity sites ( Figure 3B). These associations, however, did not reach statistical significance. Importantly, no difference in distance between repressor binding site and the trapped promoter was apparent when comparing the three groups, eliminating concerns about this potential bias for subsequent analyses of these genes ( Figure 3B). When we examined the expression levels of the different gene groups, we found that genes supporting long-range repression (group 1) were on average more highly expressed than genes that did not enable KRAB/KAP1-mediated repression (group 3) ( Figure 3C). Therefore, KRAB/KAP1-mediated silencing seems to be more effective in regions of high gene activity. To further consolidate this result, we assessed the levels of different chromatin features, correlating with transcriptional activation or repression in our different gene groups.

Chromatin features of matched repressed and nonrepressed transcriptional units
We first assessed the levels of putative barrier elements such as CTCF, H3.3/H2Az or chromatin modifiers in the different groups [39][40][41]. This was achieved by utilizing published datasets, which were used to calculate the relative abundance of these features by ROC curve analysis and by comparing these values between the groups. There was no differential association with either one of the three gene groups for the intervals tested (Additional File 5).
We then assessed the abundance of a series of histone modifications present at an interval of 1, 10 or 100 kb or 1 Mb around our promoters of interest. We first measured the levels of posttranslational histone modifications correlating with active gene expression, including histone H3 lysine 27 acetylation (H3K27ac), H2BK5 monomethylation (H2BK5me1), H3K4 mono-and trimethylation (H3K4me1, me3), H3K36me3 and H4K20me1. For this we generated genome-wide histone modification maps using a ChIP coupled to deep sequencing (ChIPseq) approach in HeLa cells. Then we used ROC curve-based heatmaps to obtain relative enrichment values for the three groups. When the longrange repressing group 1 was compared to the nonrepressing group 3 most of the active histone modifications were enriched in group 1 (Figure 4). This was the case for smaller (10 kb) and larger intervals (100 kb, 1 Mb), consistent with the idea that KRAB/KAP1-repressible genes reside in regions of very active chromatin both on a local and a more global scale ( Figure 4A).
We then measured the levels of histone modifications normally present at silent genes, such as H3K9me2/ me3, H3K27me3 and H4K20me3. For this we generated histone modification maps by ChIPseq in HeLa cells or relied on a published dataset for the distribution of H3K27me3 in this cell line [39]. Furthermore, group 1 genes were comparatively less depleted in H3K9me3 at the TSS and enriched in H4K20me3 at the promoter and within a 10 kb distance from the TSS, compared with genes from groups 2 and 3 ( Figure 5). Therefore, promoters sensitive to KRAB/KAP1-mediated repression harbor increased levels of some silent histone marks, which are embedded within a domain of very high gene activity.

Discussion
Previous analyses on the mechanisms of KRAB/KAP1mediated gene regulation have mostly examined the impact of this system on the expression of transfected promoter-reporter units. Here, we investigated KRAB/ KAP1-induced changes within the context of endogenous genes. Using a combination of promoter trapping and drug-controllable KRAB/KAP1 recruitment, we previously observed that this complex, when docked to the bodies of transcriptionally active genes, could induce silencing over distances of several tens of kilobases [28]. However, we had also noted that repression was more Matched gene groups with differential KRAB/KAP1-silencing phenotypes. The cumulative histograms illustrate the distribution of repressible (blue) and non-repressible (red) LV-TrapSil and MLV-TrapSil integrants in the transcribed region of genes trapped multiple times. Three groups were distinguished based on the pattern of repressible and non-repressible integrants over 20 kb. Group 1 genes harbored mainly repressing integrants ("long-range repression"), while group 3 contained mostly non-repressible integrants ("absence of repression"). Group 2 genes exhibited an intermediate phenotype, with repressing integrants clustered over the first 10 kb of their transcribed region ("limited repression"). (B) The three gene groups (group 1: "long-range repression", group 2: "limited repression" and group 3: "absence of repression") were analyzed by ROC curves as described in Figure 2 for genomic features over various DNA stretches. (C) Comparison of mean expression levels of the three gene groups based on publicly available microarray data. Group 1: "long-range repression", group 2: "limited repression" and group 3: "absence of repression". Statistical comparisons were made with a non-parametric Wilcoxon test. P-Value Legend: ** p < 0.01. efficient if the distance between the effector and the promoter was less than 20 kb. Furthermore, a significant fraction of trapped promoters/KRAB docking loci escaped these rules, suggesting other counteracting influences. The present large-scale comparison of the genomic features of KRAB/KAP1-responsive and KRAB/ KAP1-resistant transcriptional units identified by our gene trap system reveals a positive correlation between efficient KRAB/KAP1-mediated repression of trapped promoters and i) a gene-richer and transcriptionally more active genomic context, ii) a more euchromatic environment, and iii) the pre-existence of some repressive marks at and around the promoter.
Comparing KRAB/KAP1-repressed and nonrepressed genes gave no indication for a role of putative obstacles to the spread of heterochromatin, such as CTCF binding, accumulation of H3.3/H2Az or recruitment of HATs (reviewed in [42,43]). This is consistent with the observation that CTCF recruitment to the HS4 region of the chicken β-globin locus can be prevented without abrogating the barrier function of this DNA sequence [44]. However, it is at odds with a recent study presenting CTCF as a marker of transition between euchromatic and heterochromatic regions [39]. A model reconciling these findings would be that CTCF acts as an H3K27me3 heterochromatin-specific barrier yet has no effect on H3K9me3-based heterochromatin propagation. However, it should be emphasized that our analysis was limited to the transcribed region of genes owing to our gene trap-based The levels of specific posttranslational histone modifications around the promoters of the different gene groups were calculated over different DNA intervals ranging from 1 kb to 1 Mb. The gene groups included group 1: "long-range repression", group 2: "limited repression" and group 3: "absence of repression" and ROC curve analyses was employed. Briefly, the illustrated heat maps, contain squares giving the relative abundance of the studied histone mark. 1 is scored when the modification is enriched in the gene group when compared to a control group, while 0 indicates depletion of the mark. 0.5 is scored when there is no difference. Non-central chi-square statistical analysis compared differences between the repressible group 1 and the non-repressible group 3. P-Value Legend: * p < 0.05; ** p < 0.01. The histone modifications in the analysis included H3K27ac, H2BK5me1, H3K4me1, H3K4me3, H3K36me3, H4K20me1, which are mostly found within active chromatin. (B) Relevant histograms representing the relative H3K4me3 and H3K36me3 values over 100 kb and 1 Mb, which reached statistically relevant differences when all three gene groups were compared. approach, precluding overly general conclusions on the possible role of barrier elements.
Although both repressed and non-repressed genes were situated within euchromatic regions, as expected from the promoter-trapping approach used for their selection, we observed significant differences in both their local and broader chromatin environments. Repressed genes were in regions containing generally higher levels of major euchromatin-associated marks and higher levels of transcription compared with nonrepressed genes. Therefore, there is a positive correlation between efficient KRAB/KAP1-silencing and high gene activity. This is suggestive of a model whereby genes situated in more heterochromatic environments can only be highly expressed if endowed with an intrinsic ability to resist repressive influences, while genes located in more euchromatic environments do not need such protective mechanisms [45]. Consistently, in our analysis KRAB/KAP1-resistant units were on average closer to telomeres than their KRAB/KAP1-susceptible counterparts, although this difference did not reach statistical significance (data not shown).
Genes repressed by the TrapSil system also carried higher levels of the repressive marks H4K20me3 and H9K9me3 at baseline at and around their promoters, compared with their repression-resistant counterparts. Noteworthy, these contrasting chromatin configurations were not only observed when comparing a selected set of multiply hit repressed and non-repressed genes (Figure 5), but were also present in the complete pools of repressing and non-repressing integrants (Additional File 6). Interestingly, a recent analysis of the chromatin structure of zinc finger genes found that high levels of both H3K36me3 and H3K9me3 co-localized at the 3' exons of these genes [46]. Since KRAB-ZFP genes, which belong to this gene family, are endogenous targets of KRAB/KAP1-repression [25,47], we performed the same analysis in our HeLa cell system and reproduced the same result (Additional Files 7 and 8). Therefore, the high levels of both H3K9me3 and H3K36me3 at KRAB-ZFP gene bodies may be necessary for efficient KRAB/KAP1-induced heterochromatin spreading. The finding that the repressive H3K9me3 and the activating H3K36me3 marks are not co-regulated further supports this hypothesis [46], since high levels of H3K36me3, which positively correlate with active transcription, may independently enhance the spread of H3K9me3 at KRAB-ZFP genes. This model is reminiscent of results The levels of histone marks associated with silent chromatin were analyzed at specific intervals around the promoters of the three gene groups (group 1: "long-range repression", group 2: "limited repression" and group 3: "absence of repression"). The calculations were based on ROC curve analysis as described in Figure 4. Non-central chi-square statistical analysis compared differences between group 1 and group 2, and group 1 and group 3. The p-Value Legend is * p < 0.05; ** p < 0.01. The histone modifications in the analysis included H3K9me2, H3K9me3, H3K27me3 and H4K20me3, mainly associated with silent chromatin. (B) Relevant histograms representing relative H3K9me3 and H4K20me3 values over 1 kb and 10 kb, which reached statistically relevant differences when the three groups were compared. obtained from the TrapSil analysis, where high levels of both active and repressive histone marks can be seen in genes that accommodate KRAB/KAP1-mediated heterochromatin spreading and silencing.
A difference between genes targeted by our TrapSil system and endogenous KRAB-ZFP genes lays in the finding that the latter do not seem susceptible to KRAB/KAP1-mediated long-range repression [28,46,47]. This may be due to the use of our ectopic repressor system. Alternatively, certain endogenous promoters may be resistant to KRAB/KAP1-induced heterochromatin spreading. A possible factor in this process is the H3K9me1/2 demethylase PHF8 [48]. Active H3K9 demethylation may prevent the heterochromatization of KRAB-ZFP promoters and subsequent transcriptional silencing. This idea is consistent with recent PHF8 genome-wide binding data that showed it locating to the promoter regions of zinc finger-encoding genes [49].
Other mechanisms potentially involved in conferring resistance to KRAB/KAP1-mediated silencing are suggested by the analysis of genes that were hotspots of proviral TrapSil targeting and carried both repressible and non-repressible integrants (Additional File 9). In this subgroup, the repressible integrants generally clustered closer to the promoter than their non-repressible counterparts, consistent with the overall observation that silencing is most efficient when KRAB/KAP1 is recruited in the proximity of the affected promoter. In some cases, however, the distributions of repressible and non-repressible integrants overlapped within the same gene. This could reflect the differential susceptibilities of the two alleles of a gene to KRAB/KAP1-mediated repression, somewhat reminiscent of what is observed with imprinting, a process that involves a KRAB-ZFP [9,50]. Additionally, cells within a population may be heterogeneous for the chromatin status of specific loci, which in turn might impact on the consequences of KRAB/KAP1 recruitment. Such a phenomenon would be comparable to variegation, where particular genes are differentially expressed amongst cells of an otherwise apparently homogeneous population [51].

Conclusions
In summary, the present work indicates that the impact of KRAB-mediated docking of KAP1 on the expression of targeted genes is more variable than previously suspected. It further reveals reciprocal influences between the functional outcome of KRAB/KAP1 recruitment to DNA and the chromatin features of the involved loci. More broadly, the approach described in the present study, which combined an analysis of the functional consequences of exogenously introduced cis-acting KRAB/KAP1-recruiting sequences with an examination of the transcriptional activity, genomic context and chromatin features of targeted loci, could be fruitfully applied to the study other epigenetic regulators.

Methods
Vectors pLV-tTR-KRAB-Red was previously described [52]. pLtTR-KRAB-NG95 was cloned through ligation of a BamHI/XhoI digested MLV-based pNG95 [53] with a compatible tTR-KRAB amplicon with BamHI/XhoI sites added by PCR (primer sequences see Additional File 10). To construct LV-and MLV-based TrapSil vectors, published gene trap vectors [37] were modified by PCRbased mutagenesis (Stratagene mutagenesis kit). A BlpI restriction site was introduced into the MLV U3 region of 3'LTRs (MLV: BlpI Primers MLV Trap F/R -Additional File 10), whereas a SpeI site was introduced in the LV U3 region of 3'LTR (Primers HIV Trap F/R -Additional File 10), these new sites were then used to insert 7 repeats of TetO. LV-and MLV-based particles were produced and titered as described elsewhere http://tcf. epfl.ch/page-6764-en.html. The WPRE of LV-TRAPSIL, the GAG remnant of MLV-TRAPSIL, and the Albumin gene served for proviral and cellular genome quantification by Taqman.

Cell culture and Fluorescence activated cell sorting (FACS)
HeLa cells were grown under standard conditions. Doxycycline (Sigma-Aldrich) was used at a concentration of 1 μg/mL. Clonal tTRKRAB-expressing HeLa cell lines dsRK4 (pLV-tTR-KRAB-Red, LV-backbone) and KiN1.25 (pLTetR-KRAB-NG95, MLV-backbone) were derived after infection with pLV-tTR-KRAB-Red or pTetR-KRAB-NG95, respectively. The LV based HeLa dsRK4 clone contains approx. 15 vector copies as titrated by Taqman and was used for MLV-TRAPSIL assays while the MLV-based KiN1.25 clone contains 10 vector copies and was used for all LV-TRAPSIL assays. In view of this mapping strategy, 2 × 10 8 dsRK4 or KiN1.25 HeLa cells were infected with 1.6 × 10 6 MLV-TrapSil or LV-TrapSil infectious particles, respectively, with a multiplicity of infection of 0.04. Cells were sorted based for GFP expression by using the Beckton Dickinson FACSVantage SE turbo Sorter with Diva Option. Flow Cytometry analyses were performed on BD FACScan flow cytometer.

Quantitative PCR (qPCR)
qPCR reactions were carried out with a standard PCR program in ABI PRISM 7900 HT in duplicates or triplicate using either SYBR green detection 1× Power Sybr or 1× Taqman Universal Mix, No AmpErase (Applied Biosystems). Primers were used at a final concentration of 100 nM. When SYBR analysis was performed, cycling reactions were followed by a dissociation curve analysis to validate specificity of amplified products. The increase in fluorescence was analyzed with the SDS software, version 2.2.2 (Applied Biosystems). For all amplification plots the baseline data were set with the automatic cycle threshold function. Primer sequences for all qPCR reactions are listed in Additional File 10.
Linker-mediated PCR (LM-PCR), 454 pyrosequencing and data processing LM-PCR was used to map integration sites following a previously described protocol [31,33,34]. Briefly, 10 μg of genomic DNA (DNeasy, Qiagen) was digested with MseI. Fragments were ligated to a linker and were digested with DpnI and SacI (LV-TrapSil) or SpeI (MLV-TrapSil) to avoid contaminations with bacterial plasmids and to avoid cloning of internal vector fragments. Nested PCR then served to amplify TrapSil vector-gDNA junctions (Takara Advantage 2 kit). Amplicons ranging between 100 and 400 bp were purified, quantified and sent for pyrosequencing at GATC biotech (Konstanz, Germany). Raw sequences were downloaded from the GATC biotech website and converted to FASTA files. Sequences having exact pyrosequencing reaction primers (F: primer A; R: primer B, Additional File 10) were selected and others discarded. Selected sequences were then categorized according to barcode for TrapSil vector type and integrant type (barcodes: LI: TGAC/AGTC; LR: CTGA; MI: TCGA/AGCT; MR: GTAC). After classification, all primer sequences and viral vector overhangs were trimmed yielding only genomic DNA sequence. The 20 bases adjacent to primer B before trimming were used as tags for mapping the inserts to the human genome assembly hg18. The mapping was done using FetchGWI tolerating at most 2 mismatches [35].
Integration site mapping in genes: Integrant orientation was annotated as determined during sequence processing. UCSC known gene [54] were downloaded from UCSC tables with transcript start (Tsx), transcript end (Tsend) and gene orientation. Only integrants mapping with correct orientation within a gene were mapped relative to it. In a second step, a non-redundant gene list was generated (from the original UCSC Gene list) using an aggressive clustering strategy, which groups all transcripts that directly or indirectly (through other transcripts) overlap on the same strand of the same chromosome. In the non-redundant gene list we recorded the 5'most Tsx position and the 3'most Tsend position for each cluster. For analysis of the integrant distance to gene promoters, we considered only integrants falling within the transcribed region of the same gene. Files containing integration sites (sequence-mapping from Insipid, LV_LUI (LV irrepressible), LV_LUR (LV repressible), MLV_MUI (MLV irrepressible), MLV_MUR (MLV_repressible)) and gene groups (20 KB promoter classes with at least 3 integration sites: group 1: "long-range repressible), group 2: "short-range repressible", group 3: "long-range irrepressible") can be found under http://ccg.vital-it.ch/KAP1/.

Receiver Operator Characteristic (ROC) curve analysis
Data analysis was based on a "nested case control" strategy using a collection of TSS characterized by a given behavior with respect to repression along with control sites sampled from the genome to make inferences about the probability of a TSS to display a given response to repression based on genomic/epigenetic features characterizing its environment. More detailed description of statistical basis for this analysis can be found in [38]. Data were analyzed using the R language and environment for statistical computing/graphics version 2.3.0 and several contributed packages. Empirical ROC curve areas were calculated for datasets that used random genomic controls, in which case each TSS of a cluster was compared only with its matched controls to determine the proportions of controls whose values equaled or exceeded that of TSS [55]. Annotations of genomic features were obtained as described previously [38]; the chromatin features analyzed came from ChIPseq data generated in this and other studies [39][40][41].
Chromatin immunoprecipitation (ChIP) and ChIP followed by sequencing (ChIP-Seq) ChIP reactions were performed according to published protocols with minor modifications (http://www.millipore.com/userguides/tech1/mcproto407 and http:// cshprotocols.cshlp.org/cgi/content/full/2009/6/pdb. prot5237), using antibodies listed in Additional File 11, either native or pre-bound to beads. For Histone modifications, 2 × 10 7 HeLa cells were trypsinized and resuspended in MNase buffer. 1 U MNase (Roche) was added for 10 min and adding EDTA to a final of 10 mM arrested the nuclease reaction. Chromatin was sonicated with a Branson digital sonicator (model 250) on ice three times for 20 s and then dialyzed against RIPA with AEBSF protease inhibitor 0.2 mM for 1 h. The chromatin was pelleted after dialysis; glycerol was added to the supernatant to a final 5% concentration and the chromatin was stored at -80°C. 500 ul was incubated with AB-specific pre-coated beads over night (IP). Complexes were washed, eluted, purified, precipitated and resuspended in 50 ul H2O. For KAP1 ChIPs, approximately 2 × 10 7 cells were cross-linked with 1% formaldehyde for 8 min at RT, quenched by adding glycine and rinsed with PBS, before shearing by sonication with a Branson digital sonicator (model 250) on ice four times for 20 s at 30% intensity. 100 μl of sonicated chromatin was directly de-crosslinked and used as the total on both sides of the genes. See Additional File 8 for the list of KRAB-ZFP genes, which were included in the analysis.
Additional file 8: List of known and putative KRAB-ZFP genes included in the analysis.
Additional file 10: List of primers used in this study.
Additional file 11: List of antibodies used in this study.