Target gene selection from transcriptomics data
The selection of target genes was based on RNA-Seq data sampled at 12 time points (0 h, 3 h, 6 h, 9 h, 12 h, 18 h, 24 h, 36 h, 48 h, 72 h, 120 h, 168 h) during transdifferentiation of human BLaER1 cells to macrophages . The RNA-seq data was quantified with GRAPE-nf (https://github.com/guigolab/grape-nf). Read mapping was performed with STAR  and gene expression quantification with RSEM  using the GENCODE annotation v22 . Two biological replicates were analyzed separately.
The 19,814 pc-genes and 14,855 lncRNAs (union of the following biotypes: processed transcript, 3 prime overlapping ncRNA, sense intronic, antisense, macro lncRNA, lincRNA, non-coding and sense overlapping from GENCODE v22) were filtered for a minimum average expression of at least 1 FPKM for pc-genes (0.1 FPKM for lncRNAs) and at least 4 × fold change for protein pc-genes (2 × fold for lncRNAs) between highest and lowest expression value along the temporal profile. In addition, lncRNAs were required to have a minimum expression of 1 FPKM in at least one time point and to be non overlapping with other genes in a 5 Kb window on the same strand and 50 bp on the opposite strand relative to their TSS. This resulted in 4,804 pc-genes remaining for replicate 1 and 4,552 for replicate 2, and 642 lncRNAs for replicate 1 and 536 for replicate 2. Those genes were clustered separately for each replicate into 36 expression profiles for pc-genes and 16 for lncRNAs with k-means clustering in R. We focused on two types of expression profiles: “peaking profile” (genes that increase their expression level at the beginning of the transdifferentiation process and later on decrease) and “upregulated profile” (genes that are upregulated throughout the process). Pooling those profiles within each replicate and then intersecting between the replicates resulted in a final list of 939 protein-coding and 174 lncRNA candidate genes.
Paired guide RNA library design
For lncRNAs, CRISPETa  was used to target genes’ TSS. For pc-genes, we developed a new version of CRISPETa to target ORFs (code available at https://github.com/Carlospq/CRISPETa_PC). In this case, we first obtained the principal isoform from the APPRIS database . The exonic sequence of this isoform was extracted from the human genome sequence version h19, using the GENCODE annotation v22, and searched for all possible protospacers (20 mers followed by a PAM sequence of NGG). sgRNA were scored using the RuleSet2 algorithm  and paired. Pairs were ranked according to: 1) location in the ORF sequence, 2) the pair score calculated as the sum of the two individual sgRNA scores, and 3) the deletion region of the pair (prioritizing those predicted to create an out-of-frame deletion). The first coding exon was preferentially targeted. In case not all designs could be placed at the first coding exon, the window was extended to the second and third exons. For lncRNAs, the region targeted around the TSS was increased stepwise from 500 to 5,000 bp in consecutive runs of CRISPETa until the required number of pgRNAs was designed. Selected pgRNAs for lncRNAs were filtered so as to not overlap pc-genes. In all cases, sgRNAs were filtered to remove possible off-targets using CRISPETa’s pre-computed database with default value [-t 0,0,0,x,x] for the first run and relaxing this cutoff for consecutive runs, as described in . CRISPETa output parameters were adjusted to provide the sequence of the 165 nt oligonucleotide (Insert-1) needed for library cloning using DECKO method , which includes the targeting regions of the pgRNAs separated by a cloning site (Supplementary Table S2).
Up to ten pgRNAs were designed per target gene with a minimum distance of 50 bps between any pair of gRNAs. In total, we designed pgRNAs for 166 lncRNAs and 874 pc-genes. In addition, we designed 50 pgRNAs for each ratCEBPa, humanCEBPa, SPI1 and ITGAM positive controls. For negative controls, we designed pgRNAs for 100 intergenic regions, 10 pgRNAs each. We also included some pgRNAs targeting fluorophores (EGFP, mCherry and tdTomato) (see Supplementary Table S2). As a non-targeting negative control for library sorting assays, we used a pgRNA against Firefly luciferase, called “pDECKO-non targeting”.
A ssDNA library of 12,000 oligos of 165 nt (insert-1) (Supplementary Table S2) was purchased from Twist Biosciences. The library was amplified to obtain dsDNA using emulsion PCR as described in , and cloned into pDECKO_mCherry vector (, Addgene 78534) following the 2 cloning steps described in . ENDURA electrocompetent cells (Bionova Cientifica) were used to ensure high efficiency transformation and avoid recombination errors. Several transformations were performed in parallel. For the first cloning step (intermediate plasmid), approximately 500,000 bacterial colonies were collected and processed together in a single maxiprep. To eliminate the remaining empty plasmid, we took advantage of the fact that insert-1 (in the intermediate plasmid) contains unique restriction sites (EcoRI and BamHI), which are not present in the original backbone. Digesting the intermediate plasmid resulted in a linear product that could be distinguished from the circular empty backbone and purified in an agarose gel. For the 2nd step of cloning, 50 ng of BsmbI-digested intermediate plasmid was mixed with 1 μl annealed Insert-2 (gRNA1 constant region coupled to an H1 promoter, previously assembled from four oligonucleotides and diluted 1:20) and 1 μl of T4 DNA ligase (Thermo Scientific) and incubated for 4 h at 22ºC (as described in ). Several transformations with ENDURA electrocompetent cells were done in parallel. For the 2nd cloning step (final plasmid) more than 100,000 bacterial colonies were collected and processed together in a maxiprep. A scheme of the final plasmid can be found in Supplementary Fig. S4A. The final pooled library was deep sequenced for diversity verification (Supplementary Fig. S7). The library is available at Addgene.org (BLaER1 pgRNA CRISPR library ID 183825).
Cell culture, library infection and transdifferentiation induction
Human BLaER1 cells  were kindly provided by Thomas Graf (CRG, Barcelona) and grown in RPMI medium supplemented with 10% heat-inactivated fetal bovine serum (FBS), 2 mM L-glutamine, and 100 U/ml Penicillin–Streptomycin . BLaER1 cells were first infected with a plasmid containing Cas9 fused to BFP (, Addgene 78545), selected for more than 5 days with blasticidin (15 µg/ml) and sorted using a BD FACS Aria instrument at the Flow Cytometry Unit of the Centre for Genomic Regulation. These cells, stably expressing Cas9, were then infected with the pDECKO library. For lentivirus production, we performed 80 co-transfections of HeK293T virus packaging cells (at approximatelly 60–70% confluence on 10 cm dishes) with 3 μg of the pDECKO_mCherry plasmid library (Addgene 183825) and 2.25 μg of the packaging plasmid pVsVg (Addgene 8484) and 750 ng of psPAX2 (Addgene 12260) using Lipofectamine 2000 (Invitrogen), according to manufacturer's protocol. Transfection media was changed on the following day to RPMI. In total, 400 ml of viral supernatant were collected 48 h post transfection, filtered through a cellulose acetate filter, and used for overnight infection of 90 × 10E6 BLaER1-Cas9 cells at a density of 250,000 cells/ml with presence of polybrene (10 μg/ml). The percentage of infection was computed as the number of mCherry positive cells compared to the total number of cells with a Fortessa cell cytometer analyser. Infection rate ranged between 2–4%, ensuring a low multiplicity of infection (less than 1 viral integration per cell) . After 48 h of infection, the cells were double selected with blasticidin (20 μg/ml) and puromycin (2 μg/ml) for 18–19 days. 15 million of the BLaER1-Cas9 library infected cells were induced for transdifferentiation into macrophages by using 100 nM β-estradiol and 10 ng/ml of IL-3 and M-CSF, as described previously . After incubation for 3 days (T3) /6 days (T6) they were collected for FACS sorting.
Individual target validation
For paired guide RNA pDECKO-mCherry plasmid cloning we used the method described in  (sgRNA sequences are listed in Supplementary Table S1 and the cloning oligos are detailed in Supplementary Table S5). For single guide RNA pDECKO-mCherry plasmid cloning we used the method described in  (see Supplementary Table S6 for details of the oligos used). Plasmids constructed for this study can be found in Supplementary Table S7 (plasmids available at Addgene.org are indicated).
For lentivirus production, we co-transfected HeK293T virus packaging cells with 3 μg of each pDECKO_mCherry plasmid and packaging plasmids as described previously. Viral supernatant was collected 48 h post transfection and filtered through a cellulose acetate syringe filter. Polybrene (10 μg/ml) was added. We pelleted 5 × 10E5 BLaER1-Cas9 cells in two microcentrifuge tubes and resuspended each of them with 1 ml of viral supernatant. We performed spin-infection for 3 h at 1,000 g. After infection, the viral supernatant was removed and infected cells were resuspended with RPMI media supplemented with 10% heat-inactivated fetal bovine serum (FBS), 2 mM L-glutamine, and 100 U/ml Penicillin–Streptomycin. After 48 h of infection, we performed double selection with blasticidin (20 μg/ml) and puromycin (2 μg/ml) antibiotics. The selection was maintained for a minimum of 2 weeks.
BLaER1-Cas9 infected cells with the different pDECKO_mCherry plasmids were induced for transdifferentiation into macrophages at a density of 375,000 cells/mL by using 100 nM β-estradiol and 10 ng/ml of IL-3 and M-CSF, as described previously . After incubation for 3 days (T3) /6 days (T6) the cells were analyzed by flow cytometry.
For cell sorting: 30 × 10E6 cells were counted and resuspended in 300 μl PBS + 3% FBS in the presence of FcR blocking reagent. Cells were incubated for 10 min and 15 μl of the human anti-CD19 antibody conjugated with BV510 (Becton Dickinson, 562947) and 15 μl of human anti-cd11b (Mac1) antibody conjugated with PE-Cy7 (eBioscience, 25-0118-41) were added. Cells were incubated for 30 min in the dark, washed with PBS and resuspended in 2 ml of PBS + 3% FBS. Topro-3 was added as a viability marker. Cells were sorted in a BD FACS Aria instrument at the Flow Cytometry Unit of the Centre for Genomic Regulation.
For flow cytometry analysis: 1 × 10E6 cells were counted and resuspended in 100 μl PBS + 3% FBS in the presence of FcR blocking reagent. Cells were incubated for 10 min and 5 μl of each of the corresponding antibodies were added. For the CD19 knockout experiment, we used the antibody anti-CD19 conjugated with APC-Cy7 (Becton Dickinson, 557791). Cells were incubated for 30 min in the dark, washed with PBS and resuspended in 500 ul of PBS + 3% FBS. Topro-3 was added as a viability marker. Cells were measured in a BD Fortessa analyser. For the Stain Index calculation we used the formula: (mean positive—mean background) / (2 * SD background), as previously described .
Cell cytometry data is available in FlowRespository database (https://flowrepository.org) .
Sample processing for deep sequencing
As a quality control, the pooled library was PCR amplified in two PCR steps, for the first PCR step it was used 50 ng of library for amplification with Phusion polymerase (Thermo Fisher) using oligos Stag0nt_F and Stag0nt_R (Supplementary Table S8), annealing temperature of 60ºC and 8 cycles of amplification. For the second PCR it was used 2 μl of purified PCR product from the previous step, amplified with the same conditions using an Illumina oligo pair (Supplementary Table S9). The final product was purified with Agencourt Ampure beads (Beckman Coulter), quantified with a Qubit fluorometer (Thermo Scientific), checked for quality in a Bioanalyzer (Agilent), and sequenced on the Illumina HiSeq 2500 at the Genomics Unit of the Centre for Genomic Regulation (125 bp paired-end sequencing).
After library infection, the genomic DNA was extracted from the FACS sorted cells with the GeneJET Genomic DNA purification kit (Thermo Scientific) and two PCR steps were performed (see Fig. 3C). A scheme of oligo binding sites is shown in Supplementary Fig. S4.
A first PCR step was done by Phusion polymerase (Thermo Fisher) using 500 ng of genomic DNA and staggered oligo mix (Supplementary Table S8) with the presence of 6% DMSO, annealing temperature of 60ºC and a total of 20 cycles of amplification. We used staggered oligos to avoid the same bases being read for the constant region during Illumina sequencing and to minimize technical issues during base calling. Up to 6 PCR reactions were combined, the amplicons were gel-purified, and 2 ng were used as a template for a second PCR.
The second PCR step was also done by Phusion polymerase but without the presence of DMSO. We used Illumina barcoded oligos (Supplementary Table S9), an annealing temperature of 60ºC and a total of 8 cycles of amplification. Samples were purified with Agencourt Ampure beads (Beckman Coulter), quantified with a Qubit fluorometer (Thermo Scientific) and checked for quality in a Bioanalyzer (Agilent). We then pooled the libraries and sequenced them on the Illumina HiSeq 2500 at the Genomics Unit of the Centre for Genomic Regulation (150 bp paired-end sequencing) to have about 20 million reads per sorted subfraction. Sequencing data is available in the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress)  under accession number E-MTAB-10445.
Mapping and quantification of sequencing reads
For read mapping, based on the initial pgRNA library with two guides per target (Supplementary Table S2), an artificial genome was generated by concatenating the 41 bp of the two pgRNAs (gRNA1 21 bp, gRNA2 20 bp) and converted into FASTA format. STAR mapper (version 2.4.2a)  was used to index the genome, adjusting the standard settings by the following parameter for small genomes:
In the resulting genome after removing duplicated constructs, each pgRNA pair is represented by each one of the 11,550 chromosomes with a length of 41 bp.
Dynamic trimming of Illumina reads was done in perl by pattern matching the insertion site of the pgRNAs in the plasmid sequence (“ACCG” for pgRNA1 in the window of 15–55 bp of read2, “AAAC” for pgRNA2 in the window of 100–150 bp of read1). The extracted 20 bp fastq sequences for the pgRNA2 were reverse-complemented and concatenated to the 21 bp fastq sequences for the pgRNA1. Fusion reads with fewer than 20 bp sequence length were filtered out.
Mapping was performed with STAR version 2.4.2a with the following parameters:
STAR –runMode alignReads –runThreadN 8 –genomeDir /users/resources/genome –readFilesCommand zcat –readFilesIn pgRNA1_pgRNA2.fastq.gz –alignIntronMax 1 –outSAMtype BAM SortedByCoordinate –outSAMunmapped Within –limitBAMsortRAM 3,000,000,000 –outFilterMultimapNmax 1 –outFilterMismatchNmax 11 –outFilterMatchNmin 30 –outFilterMatchNminOverLread 0.1 –outFilterMismatchNoverLmax 0.9 –outFilterScoreMinOverLread 0.1
Given the distance between the sequencing primer and gRNA2, the pipeline was conceived to be adjustable to a variable number of mismatches. Running the pipeline without allowing for any mismatches, we could only make use of about 25 to 30% of the reads. Hence, we increased the number of allowed mismatches in progressive steps that resulted in a steep increase of mapped reads until a saturation point was reached between 10–15 mismatches, depending on the sample (Supplementary Fig. S6C). For further analysis, we allowed for a maximum of 13 mismatches to stay below 1% of multi-mapped reads for all samples of both replicates. Spearman correlation values of 0.95–1.00 between samples, mapped with zero mismatches compared with up to 13 mismatches, justified the usage of the quantification data with substantially more reads and therefore higher statistical power (Supplementary Fig. S6D). For quantification, the count for each guide pair within the mapped libraries was aggregated from the BAM files with SAMtools .
Due to the low memory footprint of the artificial genome, this quantification strategy can be applied even on laptops with moderate specifications (minimum requirements: single core CPU, 4 GB RAM, 10 GB disk space). The mapped reads were clustered to check for reproducibility between replicates (data not shown).
Analysis of the read counts
The count tables generated from the BAM files were filtered for guide pairs having at least 5 counts in the initial sample at T0, to ensure a minimum representation at the beginning of the experiment. For both biological replicates, the ratio of the FACS sorted delayed over differentiated fraction was computed for both T3 and T6. From the distribution of ratios from each of the 12,000 guide pairs, all guide designs found above the 90th percentile were selected. We further selected guide pairs for which both biological replicates of each time point had at least 2 guide designs for a given target above these 90th percentile in both time points separately.
LNA GapmeRs assay
LNA antisense oligonucleotide GapmeRs (Exiqon) complementary to human lncRNA LINC02432 (ENSG00000248810.1) (GCATGAAAGAGTTGGT) and lncRNA MIR3945HG (ENSG00000251230.1) (CTGAGAGGTGGCAAGC) were designed. A LNA oligonucleotide containing a scrambled sequence (AACACGTCTATACGC) was used as a negative control. We seeded 40,000 BLaER1 cells in a 24-well plate and the cells were grown in 1 ml complete RPMI media containing LNA GapmeRs at a final concentration between 1 and 2 μM. After 3 days of incubation, we induced transdifferentiation as described previously . Total RNA was isolated from cells after 3 days of induction.
RNA extraction, retro-transcription and quantitative PCR
RNA extractions from 1 × 10E6 cells were performed with Quick RNA Miniprep Kit (Zymo Research). 140 ng-500 ng RNA were retro-transcribed with Reverse Aid reverse transcriptase (Thermo Scientific). Quantitative PCR (qPCR) was performed with NZY Speedy qPCR Green Master mix (NZY tech) and in a LightCycler 480 Real-Time PCR System (Roche). Primer sequences are detailed in the Supplementary Table S10. Quantifications were normalized to an endogenous control (Glyceraldehyde 3-phosphate dehydrogenase, GAPDH). The relative quantification value for each target gene compared with the calibrator is expressed as 2^(-ΔΔCt).
1 × 10E6 cells were resuspended with 100 μL of Lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris pH 8, protease inhibitors). The cell lysate was sonicated in a Branson sonicator for 10 s (50% amplitude and power 7). Protein concentration was checked by Pierce BCA protein assay kit (Thermo Fisher). The samples were run in a 10% SDS-PAGE gel and transferred to a nitrocellulose membrane. The membrane was blocked with blocking buffer (TBS, 0.1% Tween 20, 5% non fat milk) O/N at 4ºC, and incubated for 1 h 30’ at room temperature with primary antibodies: anti-FURIN rabbit polyclonal antibody (Proteintech, 18413–1-AP) 1:1,000 in blocking buffer, anti-NFE2 rabbit polyclonal antibody (Proteintech, 11089–1-AP) 1:1,000 in blocking buffer, or anti-CEBPa rabbit polyclonal antibody (Santa Cruz, (14AA): sc-61) 1:1,000 in blocking buffer. After 5 washes with TBS-0.1% Tween 20, the membranes were incubated for 1 h with the secondary antibody goat anti-rabbit-HRP (Sigma, G9545) 1:10,000 in blocking buffer. After 5 washes with TBS-0.1% Tween 20, the membranes were incubated either with Amersham ECL western blotting detection reagent (GE Healthcare, RPN2209), or Super Signal West Femto Maximum Sensitivity Substrate (Thermo Fisher, 34096), and imaged in an Amersham Imager 600. As a protein loading control, the membranes were re-blotted with primary antibody rabbit anti-GAPDH-HRP polyclonal antibody (Proteintech, 10494–1-AP) 1:4,000 in blocking buffer, and incubated for 1 h at room temperature. Washes and secondary antibody incubation were performed as previously described. The presence of two bands in NFE2 western blot likely corresponds to different post-translational modifications of NFE2 . We used the following protein ladders: Supersignal molecular weight protein ladder (Life Technologies, 84785) and pre-stained Spectra multicolor broad range protein ladder (Life Technologies, 26634).
In order to sequence the edited region in BLaER1-Cas9 cells, we amplified the deletion junctions by PCR using oligos outside the cut region (Supplementary Table S11). The resulting PCR products were cloned using a TA cloning kit (Invitrogen-Life Technologies) or Topo TA cloning kit (Invitrogen-Life Technologies), according to manufacturer’s instructions. We performed colony PCR and the purified product was sequenced by Sanger sequencing.
Analysis of interacting regions by ABC
Data on significant enhancer-gene interactions was retrieved from the Activation By Contact study . Interactions from the following available cell lines on the lymphoid and myeloid branches were subset from the total number of cell lines: B cells, GM12878, Karpas 422, BJAB, CD19-positive B cells, CD14-positive monocytes, u-937 and THP1 cells. Enhancers closer than 100 bp were merged. Correlations of expression across time between lncRNA and interacting pc-genes were computed on the average of the two replicates.