- Methodology article
- Open Access
Forward genetic screen of human transposase genomic rearrangements
- Anton G. Henssen†1,
- Eileen Jiang†1,
- Jiali Zhuang2,
- Luca Pinello3,
- Nicholas D. Socci4,
- Richard Koche5,
- Mithat Gonen6,
- Camila M. Villasante1,
- Scott A. Armstrong5, 7,
- Daniel E. Bauer3,
- Zhiping Weng2 and
- Alex Kentsis1, 7, 8Email authorView ORCID ID profile
© The Author(s). 2016
- Received: 6 May 2016
- Accepted: 5 July 2016
- Published: 4 August 2016
Numerous human genes encode potentially active DNA transposases or recombinases, but our understanding of their functions remains limited due to shortage of methods to profile their activities on endogenous genomic substrates.
To enable functional analysis of human transposase-derived genes, we combined forward chemical genetic hypoxanthine-guanine phosphoribosyltransferase 1 (HPRT1) screening with massively parallel paired-end DNA sequencing and structural variant genome assembly and analysis. Here, we report the HPRT1 mutational spectrum induced by the human transposase PGBD5, including PGBD5-specific signal sequences (PSS) that serve as potential genomic rearrangement substrates.
The discovered PSS motifs and high-throughput forward chemical genomic screening approach should prove useful for the elucidation of endogenous genome remodeling activities of PGBD5 and other domesticated human DNA transposases and recombinases.
- Inverted Terminal Repeat
- HPRT1 Gene
- Recombination Signal Sequence
- Short Tandem Repeat Analysis
The human genome contains over 20 genes with similarity to DNA transposases . In addition, transposons are a major source of structural genetic variation in human populations . Recently, human THAP9 and PGBD5 have been found to mobilize transposons in human cells [3, 4]. This discovery raises the possibility that, similar to the RAG1 recombinase , these endogenous human transposases may catalyze human genome rearrangements during normal somatic cell development or in distinct disease states. The human genome contains thousands of genetic elements with apparent sequence similarity to transposons, but their evolutionary divergence hinders the identification of elements that may serve as substrates for endogenous human transposases in general , and PGBD5 in particular .
In classical genetics, forward chemical genetic screens have been successfully used to identify spontaneous mutations in bacteria, yeast and fly [7–10]. Such approaches use DNA sequencing of cells based on chemical resistance due to positive or negative phenotypic selection. For forward genetics of mammalian and human cells, mutational analysis of the hypoxanthine-guanine phosphoribosyltransferase 1 (HPRT1) gene based on the resistance to toxic purine analogues such as 8-aza- or 6-thio-guanine (referred to as thioguanine) has been used; for overview, see . Analysis of HPRT1 has several advantages for forward genetic screens: i) HPRT1 is on the X chromosome and therefore functionally hemizygous, ii) HPRT1 encodes a single domain globular protein in which alterations of any of its nine exons are expected to affect enzymatic activity, and iii) mutations can be selected both positively and negatively, enabling the specific identification of distinct mutations, as opposed to general factors controlling cellular genomic stability. Indeed, HPRT1-based forward genetic screens have been successfully used to characterize chemical mutagens [12, 13]. In human lymphocytes, this assay has also been used to identify RAG1-mediated mutations of HPRT1, and to elucidate cryptic recombination signal sequences [14, 15].
Here, we sought to develop a forward genetic screening approach suitable for the elucidation of endogenous genomic substrates of human DNA transposases and recombinases. Depending on cell type and presence of endogenous co-factors, this assay should allow for DNA transposition and recombination, or alternatively, nuclease-mediated DNA rearrangements facilitated by endogenous DNA sequence substrate preferences. Using negative and positive thioguanine resistance selection, combined with massively parallel DNA sequencing, we used HPRT1 screening to investigate the nuclease activity of PGBD5 on human genomic substrates.
The human HPRT1 gene contains 12 annotated DNA transposon copies (Additional file 1: Table S1). These transposons are only distantly related to piggyBac transposons that are evolutionarily related to the potential substrates of PGBD5 . We hypothesized that under strong selective pressure, PGBD5 may exhibit enzymatic activity on sequences in the HPRT1 gene with sufficient similarity to its endogenous substrates, be they piggyBac-related sequences or not. To test this hypothesis, we adapted the HPRT1 mutation assay in which cells containing inactivating HPRT1 mutations can be negatively or positively selected by growth in media containing hypoxanthine-aminopterin-thymidine (HAT) or thioguanine, respectively . To maximize the sensitivity of this assay, we used male BJ fibroblasts containing a single copy of the X-linked HPRT1 gene .
To test this prediction, we isolated genomic DNA from thioguanine-resistant cells expressing GFP-PGBD5 or GFP, and amplified their HPRT1 loci using long-range PCR (Fig. 2a). Consistent with prior observations of HPRT1 mutations that were either subclonal or involved variants not resolvable by electrophoresis [14, 15, 21], resultant amplicons exhibited no apparent differences in electrophoretic gel mobility between thioguanine-resistant and control cells in the presence or absence of PGBD5 (Fig. 2b). To facilitate the recovery of polyclonal populations of HPRT1 mutants, we used massively parallel paired-end Illumina DNA sequencing of resultant genomic amplicons to generate more than 32,000 sequence reads at 99 % of nucleotide bases. These data have been deposited to the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra/, accession number SRP068848), with the processed and annotated data available from the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.t748p).
laSV detects significantly more inactivating mutations in thioguanine-resistant cells
In all, our findings indicate that human PGBD5 can induce structural variation and genomic rearrangements of endogenous human HPRT1 loci. The identification of potential PGBD5 signal sequences in human genomes using the HPRT1 forward genetic screen represents a crucial first step in defining its endogenous genomic substrates in vertebrates and humans. Consistent with the distinct evolutionary history and developmental neuronal expression of PGBD5, identified PSS motifs are distinct from the recombination signal sequences (RSS) described for RAG1 in lymphocytes [14, 15]. Importantly, identified PSS motifs exhibit only limited similarity to canonical piggyBac transposons, namely preference for terminal GGG nucleotides, in support of the distinct phylogeny of PGBD5 as compared to other piggyBac-derived genes in vertebrates . Since our analysis was limited to genomic rearrangements of human HPRT1 in BJ fibroblasts the presence of thioguanine selection, it is possible that PGBD5 may exhibit different sequence preferences and remodeling activities in neurons and diseased cells where it is endogenously expressed. Our analysis did not identify bona fide ‘cut-and-paste’ DNA transposition in HPRT1, and it remains to be determined whether PGBD5 catalyzes DNA transposition of endogenous human mobile elements, or simply their nuclease-mediated DNA rearrangements. The described PSS motifs now provide essential templates for future functional studies of PGBD5-induced genomic remodeling.
Recent discovery of active human THAP9 and PGBD5 DNA transposases, combined with the functional recombination activity of RAG1, suggests that other endogenous transposase-derived genes may catalyze as of yet unknown cell-specific somatic or germ-line rearrangements in vertebrates and humans. While their identification has been substantially empowered by whole-genome sequencing, determination of their functional activities has been hindered by the lack of knowledge of their endogenous substrate sequences. We expect that the integration of forward genetic screening with massively parallel DNA sequencing, as we have done here, and structural variant genome analysis using methods such as CRISPResso/laSV should permit the determination of the genome remodeling activities of endogenous as well as engineered genome editing enzymes. While leveraging the advantages of negative and positive selection of HPRT1 forward genetic screening for specificity, this approach additionally benefits from improved sensitivity, enabling the identification of both simple and complex structural variants at base-pair resolution. This is limited only by sequencing coverage, without the need for single cell cloning that may be compromised by cell fitness effects. Finally, we anticipate that the reported PGBD5 signal sequences will lead to the elucidation of its function in health and disease.
All reagents were obtained from Sigma-Aldrich, unless otherwise specified. Synthetic oligonucleotides were synthesized and purified by High Performance Liquid Chromatography (HPLC) by Eurofins MWG Operon (Huntsville, AL, USA).
BJ-hTERT cells were obtained from the American Type Culture Collection (ATCC, Manassas, Virginia, USA). The identity of all cell lines was verified by Short tandem repeat analysis (STR) analysis and lack of Mycoplasma contamination was confirmed by Genetica DNA Laboratories (Burlington, NC, USA). Cell lines were cultured in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10 % fetal bovine serum and 100 U / ml penicillin and 100 μg / ml streptomycin in a humidified atmosphere at 37 °C and 5 % CO2.
Human PGBD5 cDNA (Refseq ID: NM_024554.3) was cloned as a GFP fusion into the lentiviral vector pReceiver-Lv103-E3156 (GeneCopoeia, Rockville, MD, USA). Lentivirus packaging vectors psPAX2 and pMD2.G were obtained from Addgene . Plasmids were verified by restriction endonuclease mapping and Sanger sequencing, and deposited in Addgene (https://www.addgene.org/Alex_Kentsis/).
Lentivirus production and transduction
Lentivirus production was carried out as described previously . Briefly, HEK293T cells were transfected using TransIT with 2:1:1 ratio of the pRecLV103 lentiviral vector, and psPAX2 and pMD2.G packaging plasmids, according to manufacturer’s instructions (TransIT-LT1, Mirus, Madison, WI). Virus supernatant was collected at 48 and 72 h post-transfection, pooled, filtered and stored at −80 °C. BJ-hTERT cells were transduced with virus particles at a multiplicity of infection of 5 in the presence of 8 μg/ml hexadimethrine bromide. Transduced cells were selected for 2 days with puromycin (5 μg/ml).
To analyze protein expression by Western immunoblotting, 1 million transduced cells were suspended in 340 μl of lysis buffer (4 % sodium dodecyl sulfate, 7 % glycerol, 1.25 % beta-mercaptoethanol, 0.2 mg/ml Bromophenol Blue, 30 mM Tris–HCl, pH 6.8). Lysates were cleared by centrifugation at 16,000 g for 10 min at 4 °C. Clarified lysates (30 μl) were resolved using sodium dodecyl sulfate-polyacrylamide gel electrophoresis, and electroeluted using the Immobilon FL PVDF membranes (Millipore, Billerica, MA, USA). Membranes were blocked using the Odyssey Blocking buffer (Li-Cor), and blotted using antibodies against GFP (mouse anti-human, 1:500, clone 4B10, Cell Signaling Technology, Beverly, MA), β-actin (rabbit anti-human, 1:5000, clone 13E5, Cell Signaling Technology, Beverly, MA), HPRT1 (rabbit anti-human, 1:1000, clone ab10479, Abcam, Cambridge, MA) and β-actin (mouse anti-human, 1:5000, clone 8H10D10, Cell Signaling Technology, Beverly, MA). Blotted membranes were visualized using goat secondary antibodies conjugated to IRDye 800CW or IRDye 680RD and the Odyssey CLx fluorescence scanner, according to manufacturer’s instructions (Li-Cor, Lincoln, Nebraska).
Hypoxanthine-aminopterin-thymidine (HAT) medium selection
HAT medium prepared using the 50× HAT supplement (Thermo Fisher Scientific) and DMEM medium with 10 % fetal bovine serum and 100 U / ml penicillin and 100 μg / ml streptomycin. Media was replaced twice weekly and cells were grown in the presence of HAT selection for 15 doublings, corresponding to approximately 5 weeks.
Cells were cultured in the presence of 120 ng/ml of 6-thioguanine for 10 doublings, corresponding to approximately 4 weeks. Media was replaced twice weekly.
Cell viability and colony formation assays
For cell viability assays, cells were seeded at a density of 200,000 cells per well in 6-well plates (Corning Life Sciences, Corning, NY, USA). Twenty four hours after seeding, medium was replaced with HAT medium. The number of viable cells was counted 2 days after treatment using Trypan Blue staining using the Neubauer hematocytometer according to the manufacturer’s instructions (Thermo Fisher Scientific).
For clonogenic assays, cells were seeded at a density of 10,000 cells per 10-cm dish and treated with 6-thioguanine (0–1 μg/ml) for 2 weeks. Resultant colonies were fixed with methanol, stained with Crystal Violet, and counted manually using a spatial grid.
To assess cell viability of cells after treatment with 6-thioguanine, cells were seeded into 9-well plates (Thermo Fisher Scientific, Waltham, MA, USA) at a density of 1000 cells per well. Cell were treated with 6-thioguanine (0–1 μg/ml) 24 h after seeding. Cell viability was quantified using the CellTiter-Glo ATP content luminescence based assay, according to manufacturer’s instructions (Promega, Madison, WI, USA).
Generation of HPRT1 amplicons
Genomic DNA was extracted from 10 million cells using the PureLink Genomic DNA Mini Kit according to the manufacturer’s instructions (Thermo Fisher Scientific). To exponentially amplify the HPRT1 gene, we designed primer pairs every 3–8 Kb (Additional file 2: Table S2). Amplicons were generated using 50 ng of gDNA in 50 μl reaction volumes containing 0.5 μM of primers. Loci 2 to 8 where amplified using the Phusion Green High-Fidelity DNA Polymerase (Thermo Scientific, Waltham, MA, USA) with the following parameters: 98 °C for 30 s, followed by 30 cycles of 98 °C for 10 s, 65 °C for 30 s and 72 °C for 3.5 min, and a final extension of 72 °C for 3.5 min. Locus 1 was amplified using the KAPA Long Range HotStart DNA Polymerase (KAPA Biosystems, Wilmington, MA, USA) with the following parameters: 94 °C for 3 min, followed by 40 cycles of 94 °C for 25 s, 60 °C for 15 s and 72 °C for 7 min, and a final extension of 72 °C for 7 min. PCR products were purified using the PureLink PCR purification kit according to the manufacturer’s instructions (Invitrogen Corp., Carlsbad, CA, USA).
Illumina library preparation and sequencing
Equimolar amounts of purified PCR amplicons were pooled, as measured using fluorometry with the Qubit instrument (Invitrogen Carlsbad, CA) and sized using the BioAnalyzer 2100 instrument (Agilent Technologies, Santa Clara, CA). The sequencing library construction was performed using the KAPA Hyper Prep Kit (KAPA Biosystems, Wilmington, MA) and 12 indexed Illumina adaptors obtained from IDT (Coralville, IO), according to the manufacturer’s instructions. After quantification and sizing, libraries were pooled for sequencing on a MiSeq (pooled library input at 10 pM) using a 300/300 paired-end run (Illumina, San Diego, CA). A total of 728,000–928,000 paired reads were generated per sample. The duplication rate varied between 0.22 and 0.27 %. The data reported in this manuscript have been deposited to the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra/, accession number SRP068848), and the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.t748p).
Mutational and structural variant analysis
For the analysis of single nucleotide and small indel mutations, we used the CRISPResso WGS utility from the CRISPResso software using default parameters . The analysis was performed on non-overlapping windows of 40 bp spanning the entire HPRT1 gene body. For the analysis of large structural variants, we used laSV with the following parameters: -s 30 -k 63 -p 30 .
PGBD5 signal sequence analysis
Clustal Omega with default parameters was used for multiple sequence alignment . PGBD5 signal sequences were defined using the following criteria: i) sequences flanking 5′ and 3′ breakpoints demonstrated at least 50 % identity less than 4 bp from the breakpoints when aligned to each other in inverted orientation, ii) aligned sequences contained no single or tandem repeats longer than 5 bp, and iii) no such alignments were identified at breakpoints of variants found in GFP control expressing cells. Sequence motifs were identified using MEME with default parameters by referencing alignments in the 5′ to 3′ direction with the breakpoint at the 3′ terminus .
Mutational frequencies were calculated as described previously , according to the following formula: Mutational frequency = −Ln(X S / N S ) / −Ln(X 0 / N 0 ), where N is the number of cells seeded and X is the number of colonies formed with (S) and without (0) thioguanine selection. The difference between the number of mutations across samples were compared using a Poisson test with an exact reference distribution.
6-TG, 6-thioguanine; DMEM, Dulbecco’s Modified Eagle Medium; HAT, hypoxanthine-aminopterin-thymidine; HPLC, high performance liquid chromatography; PCR, polymerase chain reaction; PSS, PGBD5-specific signal sequences; RSS, recombination signal sequences; SNV, single nucleotide variant; STR, Short tandem repeat analysis; SV, structural variant.
We thank David Pellman for suggesting forward chemical genetic screening for transposon discovery and Cedric Feschotte for helpful comments on the manuscript.
This work was supported by the University of Essen Pediatric Oncology Research Program (A.H.), NIH K08 DK093705 (D.E.B.), NIH K08 CA160660, Burroughs Wellcome Fund, Josie Robertson Investigator Program, and the Sarcoma Foundation of America (A.K.). L.P. is supported by NHGRI Career Development Award K99 HG008399. We thank the MSKCC Integrated Genomics Core Facility and Bioinformatics Core Facility for assistance with DNA sequencing and analysis (NIH P30 CA008748).
Availability of data and material
Data deposition: The data reported in this manuscript have been deposited to the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra/, accession number SRP068848), the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.t748p).
AH and AK designed the research, AH, EJ, CV performed the research, AH, LP, JZ, ZW, DB, AK analyzed data, AH and AK wrote the manuscript in consultation with all authors. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Ethics approval was not required for any aspect of this study.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999;9(6):657–63.View ArticlePubMedGoogle Scholar
- Stewart C, et al. A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet. 2011;7(8):e1002236.View ArticlePubMedPubMed CentralGoogle Scholar
- Majumdar S, Singh A, Rio DC. The human THAP9 gene encodes an active P-element DNA transposase. Science. 2013;339(6118):446–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Henssen AG, et al. Genomic DNA transposition induced by human PGBD5. eLife. 2015;4:e10565.View ArticlePubMedPubMed CentralGoogle Scholar
- Hiom K, Melek M, Gellert M. DNA transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell. 1998;94(4):463–70.View ArticlePubMedGoogle Scholar
- Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–68.View ArticlePubMedPubMed CentralGoogle Scholar
- Grimm S. The art and design of genetic screens: mammalian culture cells. Nat Rev Genet. 2004;5(3):179–89.View ArticlePubMedGoogle Scholar
- Shuman HA, Silhavy TJ. The art and design of genetic screens: Escherichia coli. Nat Rev Genet. 2003;4(6):419–31.View ArticlePubMedGoogle Scholar
- Forsburg SL. The art and design of genetic screens: yeast. Nat Rev Genet. 2001;2(9):659–68.View ArticlePubMedGoogle Scholar
- St Johnston D. The art and design of genetic screens: Drosophila melanogaster. Nat Rev Genet. 2002;3(3):176–88.View ArticlePubMedGoogle Scholar
- Albertini RJ. HPRT mutations in humans: biomarkers for mechanistic studies. Mutat Res. 2001;489(1):1–16.View ArticlePubMedGoogle Scholar
- Finette BA. Analysis of mutagenic V(D)J recombinase mediated mutations at the HPRT locus as an in vivo model for studying rearrangements with leukemogenic potential in children. DNA Repair. 2006;5(9–10):1049–64.View ArticlePubMedGoogle Scholar
- Snee RD, Irr JD. Design of a statistical method for the analysis of mutagenesis at the hypoxanthine-guanine phosphoribosyl transferase locus of cultured Chinese hamster ovary cells. Mutat Res. 1981;85(2):77–93.View ArticlePubMedGoogle Scholar
- Fuscoe JC, et al. V(D)J recombinase-mediated deletion of the hprt gene in T-lymphocytes from adult humans. Mutat Res. 1992;283(1):13–20.View ArticlePubMedGoogle Scholar
- Fuscoe JC, et al. V(D)J recombinase-like activity mediates hprt gene deletion in human fetal T-lymphocytes. Cancer Res. 1991;51(21):6001–5.PubMedGoogle Scholar
- O’Neill JP, Hsie AW. Phenotypic expression time of mutagen-induced 6-thioguanine resistance in Chinese hamster ovary cells (CHO/HGPRT system). Mutat Res. 1979;59(1):109–18.View ArticlePubMedGoogle Scholar
- Morales CP, et al. Absence of cancer-associated changes in human fibroblasts immortalized with telomerase. Nat Genet. 1999;21(1):115–8.View ArticlePubMedGoogle Scholar
- Johnson GE. Mammalian cell HPRT gene mutation assay: test methods. Methods in Molecular Biology, Genetic Toxicology: Principles and Methods, (Springer Science+Business Media), 2012;817:55–67Google Scholar
- Chen T, Harrington-Brock K, Moore MM. Mutant frequency and mutational spectra in the Tk and Hprt genes of N-ethyl-N-nitrosourea-treated mouse lymphoma cellsdagger. Environ Mol Mutagen. 2002;39(4):296–305.View ArticlePubMedGoogle Scholar
- Aplan PD, et al. Disruption of the human SCL locus by “illegitimate” V-(D)-J recombinase activity. Science. 1990;250(4986):1426–9.View ArticlePubMedGoogle Scholar
- Albertini RJ, Nicklas JA, Skopek TR, Recio L, O’Neill JP. Genetic instability in human T-lymphocytes. Mutat Res. 1998;400(1–2):381–9.View ArticlePubMedGoogle Scholar
- Pinello L, et al. CRISPResso: sequencing analysis toolbox for CRISPR-Cas9 genome editing. Nat Biotechnol. 2015; In pressGoogle Scholar
- Zhuang J, Weng Z. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes. Nucleic Acids Res. 2015;43(17):8146–56.View ArticlePubMedPubMed CentralGoogle Scholar
- Sievers F, Higgins DG. Clustal omega. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis … [et al.]. 2014;48:3 13 11-13 13 16Google Scholar
- Tanaka E, Bailey TL, Keich U. Improving MEME via a two-tiered significance analysis. Bioinformatics. 2014;30(14):1965–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Cudre-Mauroux C, et al. Lentivector-mediated transfer of Bmi-1 and telomerase in muscle satellite cells yields a duchenne myoblast cell line with long-term genotypic and phenotypic stability. Hum Gene Ther. 2003;14(16):1525–33.View ArticlePubMedGoogle Scholar
- Kentsis A, et al. Autocrine activation of the MET receptor tyrosine kinase in acute myeloid leukemia. Nat Med. 2012;18(7):1118–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings / … International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for Molecular Biology. 1994;2:28–36.Google Scholar