Developing a cheap, high-throughput screening strategy for CRISPR-Cas9
The preparation of barcoded DNA libraries is generally achieved either through the ligation of unique barcode and adaptor sequences to fragmented DNA [15], or by incorporating barcode and adaptor sequences into PCR primers, so that the barcodes and the appropriate adaptors are added to the PCR product during the amplification process [16]. The ligation strategy is generally utilized when the sample contains a complex pool of DNA fragments, or when the identity of the DNA fragments is unknown, for example in a ChIPseq experiment. Whereas, when the region to be sequenced is small and the number of samples is large, it is practical to incorporate barcodes and adaptors into “fusion” PCR primers. However, “fusion primers” can become prohibitively expensive as the number of samples to be sequenced in parallel increases. Often over ~50-60 nucleotides in length, they require more extensive purification steps to ensure that the majority of the primers are full-length and contain the complete sequencing adaptor sequences. In order to create a more cost-effective amplicon library for multiplexing a large number of CRISPR-Cas9 clones, we used a hybrid approach, in which the DNA barcode is included in the primer, along with a target specific sequence, while the sequencing platform specific adaptors are ligated in a subsequent reaction (Figure 1).
Since the NHEJ repair pathway results in indels of various sizes at the CRISPR-Cas9 targeted site, we reasoned that screening primers should be designed to create an amplicon over the targeted region and cover as much of the surrounding DNA as possible. For the IonTorrent PGM, we decided on an amplicon length of 200 bps, maximizing our ability to detect a variety of mutations, while ensuring that the majority of the reads reach the forward and reverse barcode.We used a row and column based barcoding system, to reduce the number of primers required for screening. By using 12 barcoded forward primers (columns) and 8 barcoded reverse primers (rows), it is possible to create a uniquely barcoded amplicon for up to 96 clones (Figure 2). We chose to use the barcode sequences from the published IonXpress barcode set, as they have been optimized to work with the flow set of bases used by the Ion Torrent PGM. Using this barcoding system requires a total of only 20 primers, each approximately 30 nucleotides in length.
Screening for CRISPR-Cas9 induced mutations in Evx1
In order to demonstrate the applicability of our screening strategy, we generated mutant mES cells for the homeobox gene, Evx1, which has previously been shown to be dispensable for mouse embryonic development [17]. An overview of our general screening methodology is shown in Figure 2.
After transfection, sorting and expansion of CRISPR-Cas9 targeted clones, we amplified the targeted region of Evx1 using barcoded forward and reverse primers. We validated the amplification of Evx1 in a number of clones using one set of barcoded primers (Additional file 1: Figure S1). Using the aforementioned strategy, we then amplified 67 clones, each with a unique barcoded identity, in a 96 well plate format. DNA was pooled from each of the wells in equal proportion, and quantified prior to template library preparation. We neglected to normalise the quantity of DNA from each PCR product, as we reasoned the efficiency of PCR from each clone would likely be similar and that even with discrepancies in the concentration of DNA, we would achieve sufficient sequencing coverage of the least abundant clone to identify the CRISPR induced mutations. Sequencing adaptors (Ion Torrent A and P1) were ligated to the pooled DNA, and the library was then sequenced on the IonTorrent PGM. To determine the mutations present in each clone, it was necessary to de-multiplex the samples. We achieved this by writing a custom script in the R programming language that makes use of the open source ‘ShortRead’ package available from the Bioconductor website (http://www.bioconductor.org/). We applied this script to our data, which produces a separate FastQ file for each individual clone and then mapped all of the FastQ files to mm9 using Bowtie2.
Sufficient coverage was achieved across all 67 clones, enabling the identification of mutations in each clone. The amplicon coverage varied from 313 fold to 6591 fold, with a mean coverage of 2455 fold (Additional file 1: Figure S2). Over 95% of clones were covered between 1/5th and 5 times the mean coverage indicating that our sequencing coverage was fairly uniform and that equalization of the individual PCR products was not required.
We visualized the data using Integrated Genomics Viewer (IGV) version 2.3.34 and annotated the mutations by visual inspection. Sequencing errors could easily be distinguished from real indels by the fact that they show an extreme strand bias and typically occur in the same position in multiple samples [18, 19] (Figure 3). Bona fide mutations generally map in roughly equal proportions to both strands (Figure 3B).
As has been reported previously, the incidence of CRISPR-Cas9 induced mutations was very high (65 out of 67 clones showing some form of genetic insult) and all mutations were located close to the targeted sequence [7] (Figure 3A). Our technique allowed us to distinguish heterozygous mutants from homozygous mutants, as well as identify samples that contained more than two different types of mutations indicating a mixed clone. Overall we identified 10 homozygous mutants, 27 compound heterozygotes, 4 heterozygotes, 2 wildtypes, and 21 clones that showed more than two different alleles (3 clones could not be mapped).
We suspect that the mixed clones are likely the result of sorting multiple cells into a single well during the FACS process or ongoing pCas9 activity after cell division, resulting in different mutations in each daughter cell. Importantly, using our screening method, we were able to identify the presence of mixed clones, which could be missed when screening using plasmid-cloning and Sanger sequencing. Establishing the integrity of the derived clones is essential for downstream analysis, especially when the desired result is the complete disruption of the targeted gene. Once identified, mixed clones can either be avoided in subsequent functional assays, or clonally isolated by serial dilution.