- Open Access
A theoretical investigation of DNA dynamics and desolvation kinetics for zinc finger proteinZif268
- Shayoni Dutta1,
- Yoshita Agrawal1,
- Aditi Mishra1,
- Jaspreet Kaur Dhanjal1 and
- Durai Sundar†1Email author
© Dutta et al. 2015
- Published: 9 December 2015
Transcription factors, regulating the expression inventory of a cell, interact with its respective DNA subjugated by a specific recognition pattern, which if well exploited may ensure targeted genome engineering. The mostly widely studied transcription factors are zinc finger proteins that bind to its target DNA via direct and indirect recognition levels at the interaction interface. Exploiting the binding specificity and affinity of the interaction between the zinc fingers and the respective DNA can help in generating engineered zinc fingers for therapeutic applications. Experimental evidences lucidly substantiate the effect of indirect interaction like DNA deformation and desolvation kinetics, in empowering ZFPs to accomplish partial sequence specificity functioning around structural properties of DNA. Exploring the structure-function relationships of the existing zinc finger-DNA complexes at the indirect recognition level can aid in predicting the probable zinc fingers that could bind to any target DNA. Deformation energy, which defines the energy required to bend DNA from its native shape to its shape when bound to the ZFP, is an effect of indirect recognition mechanism. Water is treated as a co-reactant for unfurling the affinity studies in ZFP-DNA binding equilibria that takes into account the unavoidable change in hydration that occurs when these two solvated surfaces come into contact.
Aspects like desolvation and DNA deformation have been theoretically investigated based on simulations and free energy perturbation data revealing a consensus in correlating affinity and specificity as well as stability for ZFP-DNA interactions. Greater loss of water at the interaction interface of the DNA calls for binding with higher affinity, eventually distorting the DNA to a greater extent accounted by the change in major groove width and DNA tilt, stretch and rise.
Most prediction algorithms for ZFPs do not account for water loss at the interface. The above findings may significantly affect these algorithms. Further the sequence dependent deformation in the DNA upon complexation with our prototype as well as preference of bases at the 2nd and 3rd position of the repeating triplet provide an absolutely new insight about the indirect interactions undergoing a change that have not been probed yet.
- ZFP-Zinc finger proteins
- desolvation energy
- DNA deformation
Genome engineering is at its inception where genome editing tools need to: help design DNA templates of choice, construction of designer proteins to manipulate DNA, implementation, testing and debugging. The current pace of development unveils the promising applications of the genome targeting tools, if large scale reengineering of genomes are carried out . Evaluating literature strengthens the scope to exploit the new protein fold in ZFPs showcasing DNA binding affinity based on novel recognition principals, holding the key to engineering novel Zinc fingers for targeted genome therapy. Fingers with different triplet specificity can be engineered by mutating the key amino acid residues hence enabling specificity in DNA recognition by ensuring a large number of combinatorial possibilities. Further, linking these modules or fingers as they function independently can ascertain the recognition of longer DNA stretches. Understanding how DNA molecules interact with ZFPs, critically adheres to their structure-function relationships. These relationships conspicuously deal with conformational changes in DNA and dewetting at the interaction interface of ZFP-DNA, alleviating paltry and meager aspects of affinity and specificity respectively. Characterization of binding sites is best inferred from recognition of sequence-specific contacts, mostly called direct recognition or direct readout. This mechanism highlights the "recognition code" between the key amino acid residues on the alpha-helix of ZFP and the nucleotide bases of the target DNA. Sequence dependence alone does not completely explain specificity in protein-DNA binding. Binding affinity gets afflicted by even mutating bases not in direct contact with the protein residues [3, 4], implying that proteins employ modes other than direct recognition. DNA structural changes momentously affect its interactions with proteins . Recognition of DNA structural properties is referred to as indirect recognition or indirect readout . Governed by the binding free energy of a protein-DNA interaction, some proteins bind more strongly to certain regions of the DNA than the other regions. Structural properties of DNA effecting indirect readout by proteins include flexibility, elasticity, bending and kinking, major and minor groove widths, and hydration[8–10]. The energy expended to deform DNA from its native conformation to the conformation in a protein-bound complex emphasizes on a potential recognition mechanism is the DNA deformation energy [11, 12]. We have run 180 ns molecular dynamics simulations and reflected upon DNA contribution at the interaction interface based on RMSD and stability of trajectory. Further fortified by evaluating structural properties of DNA like flexibility, bending and major groove width changes across the simulation to optimize our study for DNA bending upon binding to ZFP.
Starting structures, models and docking studies
Sample set of eight 9 bp DNA targets.
Sample target DNA sequences :5' GNN GNNGNN 3'
The 3DNA standalone software, was used to generate the 6 GNN-GNN-GNN DNA templates, which has a directory containing repeating units for each type of the 55 fibers DNA and RNA structures[16, 17]. This feature allows the user to model DNA structured from just its nucleotide sequence. On choosing this mode, the user is asked to input the base sequence in the form of a data file (complete sequence) or from keyboard (only the repeating sequence). The options -a|-b|-c|-d|-z can be used for A-DNA, B-DNA, C-DNA, D-DNA and Z-DNA models respectively.
The HADDOCK software algorithm which is a data-driven approach to docking, utilizes distance constraints extracted from experimental data (gathered from various possible sources, such as NMR, conservation data, etc.), to reconstruct and refine the protein-DNA complex. The DNA PDB files generated from 3DNA had to be converted to haddock-compatible format by removing the Chain-IDs and SegIDs. Failing to do so led the software to misinterpret the PDB files, leading to arbitrary loss of secondary structure. Restraint files were generated based on the interaction interface. Active residues, those involved in direct readout and passive residues, involving the neighboring off target sites were defined. Number of structure for rigid body docking (it0) was set from 1000 to 750 and for refinement (it1) from 200 to 100 (rate determining step). This was justified as the structure of Zif268 was extracted from its already complexed state with its consensus DNA and hence it was assumed to be close to the confirmation it would attain when docked with the new DNA. Solvated rigid body docking was not used, as the effect of solvent was determined using free energy perturbation.
Molecular Dynamics simulation procedure
The GPU accelerated Amber molecular dynamics suite with Amber FF03force field was used to perform all atoms explicit molecular dynamics simulations (MD simulations) of protein-DNA complexes obtained upon docking http://ambermd.org/#Amber12[20–22]. The FF03 force field includes the Barcelona modification (force field pmbsc0) for nucleotide sequences mostly DNA in combination with the amber all atom force field parameters for the CaDA approach using an explicit solvent model, was used to define parameters for docked protein-DNA complexes generated using the program HADDOCK. Since the pmbsc0 force fields biggest success is its ability to drive structures from incorrect to correct conformations, its integration with the FF03 force field will ensure conformational transitions upon minimization to get the final refined structure. Further the zinc finger protein-DNA complexes containing Zn atoms were minimized using the "cationic dummy atom approach (CaDA)" which uses four identical cationic dummy atoms to mimic zinc's 4s4p3 vacant orbital's which can adjust the lone-pair electrons of zinc coordinates, hereby simulating zinc's propensity for four-ligand coordination. The methods advantage lies in maintaining zinc's four ligand coordination in ZFPs in absence of harmonic restraints rigidifying the zinc-containing active sites.
Protein-DNA complex molecules were solvated with TIP3P water model  in a cubic periodic boundary box to generate required systems for MD simulations and systems were neutralized using appropriate number of counter ions. The distance between octahedron box wall and protein complex was set to greater than 10Å to avoid direct interaction with its own periodic image. Neutralized system was then minimized, heated up to 300 K temperature and equilibrated until the pressure and energies of systems were stabilized. Finally, equilibrated systems were used to run 30 ns long MD simulations for each. During the MD simulations, RMSD and H-bond fluctuations of DNA with protein were calculated using VMD software . All simulation studies were performed on Intel Core 2 Duo CPU @ 3 GHz of HP origin with 1 GBDDR RAM and DELL T3600 workstation with 8 GB DDR RAM and NVIDIA GeForce GTX TITAN 6 GB GDDR5 Graphics Card.
Procedure to evaluate DNA deformation upon complexation
To evaluate the DNA deformation upon binding to Zif268 for each DNA template, 3DNA software was used to identify helical parameters of the DNA template upon docking versus its conformational change upon stabilization due to MD simulation. The change in major groove width and tilt before and after complexation were evaluated using Perl scripts.
Free Energy Perturbation method
The non-bonded contribution to the intramolecular energy is also computed using the same expression for all pairs of sites separated by more than three bonds.
The docked complexes were solvated in an orthorhombic water box using a 10 Å buffer with no ions. All the simulations were run with the TIP3P water model with default parameters implemented at our in-house Multisim Facility. Since the complex contains our target DNA and protein and the protein is fixed, an absolute free energy calculation was performed. Protein in solvent and protein in vacuum was kept constant and the final energy of desolvation for the DNA was calculated. All the desolvation energies for the sample targets obtained are relative values and this method has been optimized keeping time and computation constraints in mind.
Though literature studies show high binding affinity for GC rich sequences in case of zinc finger proteins, studies uncovering the indirect interaction dynamics like stability in terms of DNA deformation and desolvation energy in this case haven't been reported so far. Our studies reveal insights about the same.
Binding affinity determined by docking scores and respective KDvalues
Literature review based on KD values show that the prototype Zif268 has a KD value 0.4 for
Free energy perturbation and docking score data for our sample of 6 GNNGNNGNN target DNA bound to Zif268 protein sequence.
Target DNA sequence 5'-3'
dG Solvation (kcal/mol)
-1742.44 ± 49.88
RER RHR RER
-1817.5 ± 48.61
RER RHR RER
-1905.5 ± 48.79
RER RHR RER
-1952.35 ± 340.75
RER RHR RER
-5150.62 ± 137.57
RER RHR RER
-5156.8 ± 137.39
RER RHR RER
-5411.88 ± 141.45
RER RHR RER
-5460.13 ± 143.252
RER RHR RER
Direct correlation between binding affinity and stability of complex determined by RMSD plots
Indirect interactions of Zif268 with DNA targets of the type 5' GNN-GNN-GNN 3' demonstrating the varying binding strength
Target DNA sequence preference by Zif268 based on hydrogen bond retention
Establishment of sequence-dependent DNA deformability
Establishment of sequence-dependent DNA desolvation
The energy required to expel water from the DNA interface upon complexation is also dependent on the target DNA sequence. The FEP values for G rich or even GC rich targets, which are the strongest binders are more negative (-5411.88 ± 141.459) revealing greater solvent loss at the interface than compared to that of the AT rich ones (-1742.44 ± 49.8897), the weakest binder. The FEP data (Table 2) shows that the 2nd and 3rdbase position of the repeating triplet 5' GN (2nd) N (3rd) 3' if dominated by G experiences greater solvent loss upon complexation followed by C and A. If these base positions are dominated by T, least solvent loss is seen at the interaction interface. Our desolvation kinetics data obtained from running free energy perturbation also corroborates the assumption in theory that greater the loss of bulk solvent at the interaction interface of ZFP-DNA complexation stronger the binding affinity and stability of the complex.
Both DNA deformation and desolvation reveal data to affirm greater deformation of DNA in case of more stable interactions followed by more negative energy needed to expel water at these interfaces.
Though 5' GAAGAAGAA 3' has a less negative docking score of -114.69 and based on docking, should have been the weakest binder as compared to -117.87 of 5' GTTGTTGTT 3' but the RMSD graphs generated upon simulation show 5' GAAGAAGAA3' to be more stable than 5'GTTGTTGTT 3', even the desolvation energy follows the same preference, confirming 5' GTTGTTGTT 3' to be the weakest binder. But target 5' GCCGCCGCC 3' does not quite obey our theoretical assumptions in case of binding affinity and stability (Table 2), though it obeys indirect interactions like desolvation and DNA deformation (Additional File 1: Figure S1D). This observation might imply the strong role of indirect factors in DNA-ZFP complexation.
The target DNA sequences which had strong binding affinity for Zif268 shows higher stability, greater retention of hydrogen bonds, greater deformation of its respective DNA and higher solvent loss at the interaction interface. Conversely, the weak binders show lower stability, lower retention of hydrogen bonds, lesser DNA deformation and desolvation. The binding affinity, stability, DNA deformation and desolvation are sequence dependent. These parameters favor the 2nd and 3rdbase position of the repeating triplet 5' GN (2nd) N (3rd) 3' dominated by G followed by C, A and T.
The dynamics of water molecules in the binding affinity of DNA-ZFP upon complexation has never witnessed an experimental platform and most of the tools that enable prediction of optimum ZFPs for our target DNA have overlooked it. Such a finding with the patterns unveiled can revolutionize the way we look at ZFPs for any target DNA and improve accuracy of many tools.
SD acknowledges the award of INSPIRE Scholarship from DST, Govt. of India. YA and AM were recipients of the Summer Undergraduate Research Award (SURA) from IIT Delhi. This study was made possible in part through the support of a grant from the Lady Tata Memorial Trust, Mumbai, DuPont Young Professor Award and the Department of Biotechnology (DBT) under the Bioscience Award Scheme to DS. Computations were performed at the Bioinformatics Centre at IIT Delhi, supported by the DBT, Govt. of India.
Funding for open access charges: IIT Delhi (IRD/RP00713 to D.S.)
This article has been published as part of BMC Genomics Volume 16 Supplement 12, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/16/S12.
- Carr PA, Church GM: Genome engineering. Nat Biotechnol. 2009, 27 (12): 1151-1162.PubMedView ArticleGoogle Scholar
- Isalan M, Klug A, Choo Y: Comprehensive DNA recognition through concerted interactions from adjacent zinc fingers. Biochemistry. 1998, 37 (35): 12026-12033.PubMedView ArticleGoogle Scholar
- Szymczyna BR, Arrowsmith CH: DNA-binding specificity studies of 4 ETS proteins supports an" indirect read-out" mechanism of protein-DNA recognition. Journal of Biological Chemistry. 2000Google Scholar
- Gromiha MM, Siebers JG, Selvaraj S, Kono H, Sarai A: Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. Journal of molecular biology. 2004, 337 (2): 285-294.View ArticleGoogle Scholar
- Baldi P, Lathrop R: DNA structure, protein-DNA interactions, and DNA-protein expression. Altman, R et al. 2001, 101-102.Google Scholar
- Aeling KA, Steffen NR, Johnson M, Wesley Hatfield G, Lathrop RH, Senear DF: DNA deformation energy as an indirect recognition mechanism in protein-DNA interactions. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2007, 4 (1): 117-125.View ArticleGoogle Scholar
- Steffen NR, Murphy S, Tolleri L, Hatfield GW, Lathrop RH: DNA sequence and structure: direct and indirect recognition in protein-DNA binding. Bioinformatics. 2002, 18 (suppl 1): S22-S30.PubMedView ArticleGoogle Scholar
- Hogan M, Austin R: Importance of DNA stiffness in protein-DNA binding specificity. 1987Google Scholar
- Gromiha MM: Influence of DNA stiffness in protein-DNA recognition. Journal of biotechnology. 2005, 117 (2): 137-145.PubMedView ArticleGoogle Scholar
- Harrington RE, WiNicov I: New concepts in protein-DNA recognition: sequence-directed DNA bending and flexibility. Progress in nucleic acid research and molecular biology. 1994, 47: 195-270.PubMedView ArticleGoogle Scholar
- Li W, Nordenskiöld L, Zhou R, Mu Y: Conformation-dependent DNA attraction. Nanoscale. 2014, 6 (12): 7085-7092.PubMedView ArticleGoogle Scholar
- Steffen NR, Murphy SD, Lathrop RH, Opel ML, Tolleri L, Hatfield GW: The role of DNA deformation energy at individual base steps for the identification of DNA-protein binding sites. GENOME INFORMATICS SERIES. 2002, 153-162.Google Scholar
- Eggers DK, Castellano BM, Dharmaraj S: Calorimetric Determination of Desolvation Energy for a Model Binding Reaction in Dilute and Crowded Solutions. Biophysical Journal. 2013, 104: 576-View ArticleGoogle Scholar
- Jayaram B, Jain T: The role of water in protein-DNA recognition. Annu Rev Biophys Biomol Struct. 2004, 33: 343-361.PubMedView ArticleGoogle Scholar
- Jamieson AC, Kim SH, Wells JA: In vitro selection of zinc fingers with altered DNA-binding specificity. Biochemistry. 1994, 33 (19): 5689-5695.PubMedView ArticleGoogle Scholar
- Chandrasekaran R, Radha A, Park HS, Arnott S: Structure of the beta-form of poly d(A).poly d(U). Journal of biomolecular structure & dynamics. 1989, 6 (6): 1203-1215.View ArticleGoogle Scholar
- Chandrasekaran R, Wang M, He RG, Puigjaner LC, Byler MA, Millane RP, Arnott S: A re-examination of the crystal structure of A-DNA using fiber diffraction data. Journal of biomolecular structure & dynamics. 1989, 6 (6): 1189-1202.View ArticleGoogle Scholar
- Lu XJ, Olson WK: 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc. 2008, 3 (7): 1213-1227.PubMedView ArticleGoogle Scholar
- De Vries SJ, van Dijk M, Bonvin AM: The HADDOCK web server for data-driven biomolecular docking. Nature protocols. 2010, 5 (5): 883-897.PubMedView ArticleGoogle Scholar
- Gotz AW, Williamson MJ, Xu D, Poole D, Le Grand S, Walker RC: Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. Journal of chemical theory and computation. 2012, 8 (5): 1542-1555.PubMedView ArticleGoogle Scholar
- Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C: Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins. 2006, 65 (3): 712-725.PubMedView ArticleGoogle Scholar
- Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ: The Amber biomolecular simulation programs. Journal of computational chemistry. 2005, 26 (16): 1668-1688.PubMedView ArticleGoogle Scholar
- Pérez A, Marchán I, Svozil D, Sponer J, Cheatham Iii TE, Laughton CA, Orozco M: Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of α/γ Conformers. Biophysical Journal. 2007, 92 (11): 3817-3829.PubMedView ArticleGoogle Scholar
- PANG Y-P, XU K, YAZAL JE, PRENDERGAST FG: Successful molecular dynamics simulation of the zinc-bound farnesyltransferase using the cationic dummy atom approach. PRS. 2000, 9 (10): 1857-1865.Google Scholar
- Bhattacharya D, Cheng J: 3Drefine: Consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins: Structure, Function, and Bioinformatics. 2013, 81 (1): 119-131.View ArticleGoogle Scholar
- Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML: Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics. 1983, 79: 926-View ArticleGoogle Scholar
- Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA: PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic acids research. 2007, 35 (suppl 2): W522-W525.PubMedView ArticleGoogle Scholar
- Shivakumar D, Williams J, Wu Y, Damm W, Shelley J, Sherman W: Prediction of absolute solvation free energies using molecular dynamics free energy perturbation and the OPLS force field. Journal of chemical theory and computation. 2010, 6 (5): 1509-1519.PubMedView ArticleGoogle Scholar
- Jorgensen WL, Tirado-Rives J: The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society. 1988, 110 (6): 1657-1666.View ArticleGoogle Scholar
- Beerli RR, Segal DJ, Dreier B, Barbas CF: Toward controlling gene expression at will: specific regulation of the erbB-2/HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. Proceedings of the National Academy of Sciences. 1998, 95 (25): 14628-14633.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.