Volume 13 Supplement 1
Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012)
Learning generative models of molecular dynamics
 Narges Sharif Razavian^{1},
 Hetunandan Kamisetty^{2} and
 Christopher J Langmead^{3, 4}Email author
DOI: 10.1186/1471216413S1S5
© Razavian et al.; licensee BioMed Central Ltd. 2012
Published: 17 January 2012
Abstract
We introduce three algorithms for learning generative models of molecular structures from molecular dynamics simulations. The first algorithm learns a Bayesianoptimal undirected probabilistic model over userspecified covariates (e.g., fluctuations, distances, angles, etc). L_{1} regularization is used to ensure sparse models and thus reduce the risk of overfitting the data. The topology of the resulting model reveals important couplings between different parts of the protein, thus aiding in the analysis of molecular motions. The generative nature of the model makes it wellsuited to making predictions about the global effects of local structural changes (e.g., the binding of an allosteric regulator). Additionally, the model can be used to sample new conformations. The second algorithm learns a timevarying graphical model where the topology and parameters change smoothly along the trajectory, revealing the conformational substates. The last algorithm learns a Markov Chain over undirected graphical models which can be used to study and simulate kinetics. We demonstrate our algorithms on multiple molecular dynamics trajectories.
Introduction
The three dimensional structures of proteins and other molecules vary in time according to the laws of thermodynamics. Each molecule visits an ensemble of states which can be partitioned into distinct conformational substates [1, 2] consisting of similar structures. The study of these conformational substates remains an active area of research [3–5] and has provided valuable insights into biological function, such as enzyme catalysis [5–7] and energy transduction [8].
Molecular dynamics (MD) simulations are often used to characterize conformational dynamics [9]. These simulations are performed by numerically integrating Newton's laws of motion for a set of atoms. Conformational frames are written to disk into a trajectory for subsequent analysis. Until recently, MD simulations were limited to timescales of several tens of nanoseconds (ns = 10^{9} sec.). Recent advances in hardware and software (e.g., [10–14]) make it possible to investigate conformational dynamics on microsecond (μs = 10^{6} sec.) and millisecond (ms = 10^{3} sec.) timescales. Such long simulations are especially wellsuited to identifying and studying the conformational substates relevant to biological function. Unfortunately, the corresponding trajectories are often difficult to analyze and interpret due to their size and complexity. Thus, there is a need for algorithms for analyzing such long timescale trajectories. The primary goal of this paper is to introduce new algorithms to do so.
Our approach to analyzing MD data is to learn generative models known as Markov Random Fields (MRF). This is the first time MRFs have been used to model MD data. A MRF is an undirected probabilistic graphical model that encodes the joint probability distribution over a set of userspecified variables. In this paper those variables correspond to the positional fluctuations of the atoms, but the technique can be easily extended to other quantities, such as pairwise distances or angles. The generative nature of the model means that new conformations can be sampled and, perhaps more importantly, that users can make structural alterations to one part of the model (e.g., modeling the binding of a ligand) and then perform inference to predict how the rest of the system will respond.
We present three closely related algorithms. The first algorithm learns a single model from the data. Both the topology and the parameters of the model are learned. The topology of the learnt graph reveals which variables are directly coupled and which correlations are indirect. Alternative methods, such as constructing a covariance matrix cannot distinguish between direct and indirect correlations. Our algorithm is guaranteed to produce an optimal model. Regularization is used to reduce the tendency of overfitting the data. The second algorithm learns a timevarying model where the topology and parameters of the MRF change smoothly over time. Timevarying models reveal the different conformational substates visited by the molecule and the features of the the energy barriers that separate them. The final algorithm learns a Markov Chain over MRFs which can be used to generate new trajectories and study to kinetics.
Background
Molecular dynamics simulation
Molecular Dynamics simulations involve integrating Newton's laws of motion for a set of atoms. Briefly, given a set of n atomic coordinates $\mathbf{X}=\left\{{\overrightarrow{X}}_{1},...,{\overrightarrow{X}}_{n}:{\overrightarrow{X}}_{i}\in {\mathbb{R}}^{3}\right\}$ and the corresponding velocity vectors $\mathbf{V}=\left\{{\overrightarrow{V}}_{1},...,{\overrightarrow{V}}_{n}:{\overrightarrow{V}}_{i}\in {\mathbb{R}}^{3}\right\}$, MD updates the positions and velocities of each atom according to an energy potential. The updates are performed via numerical integration, resulting in a conformational trajectory. The size of the time step for the numerical integration is normally on the order of a 12 femtoseconds (fs = 10^{15} sec), meaning that a 1 microsecond simulation requires one billion integration steps. In most circumstances, every 1000th to 10000th conformation is written to disc as an ordered series of frames.
Traditional methods for analyzing MD data either monitor the dynamics of global statistics (e.g., the radius of gyration, total energy, etc), or else identify substates via a clustering the frames [15–17] or through Principal Components Analysis (PCA) and closely related methods (e.g., [18–22]). Clustering based methods do not produce generative models and generally rely on pairwise comparisons between frames and thus run in quadratic time with respect to the number of frames in the trajectory. Our algorithms produce generative models and only perform linear work in the number of frames. This complexity difference is especially important for long timescale simulations. PCAbased methods implicitly assume that the data are drawn from a multivariate Gaussian distribution. Our method makes the same assumption but differs from PCA in two important ways. First, PCA projects the data onto an orthogonal basis. Our method involves no change of basis, making the resulting model easier to interpret. Second, we employ L 1 regularization when learning the parameters of our model. Regularization is a common strategy for reducing the tendency to overfit data by, informally, penalizing overly complicated models. We use L 1 regularization because it has desirable statistical properties. Specifically, it leads to consistent models (that is, given enough data our algorithm learns the true model) while while enjoying high efficiency (that is, the number of samples needed to achieve the true model is small).
More recently, Lange and Grubmüller introduced full correlation analysis [23], which can capture both linear and nonlinear correlated motions from MD simulations. The algorithms in this paper are limited to linear models, but we note that they can be easily extended to more complex forms by using nonGaussian random variables (e.g., [24, 25]). Our final algorithm produces models that resemble Markov State Models (MSMs) [26] but are different in that they are fully generative.
Markov Random Fields
A Markov Random Field $\mathcal{M}=\left(\mathcal{G},\Theta \right)$ consists of an undirected graph $\mathcal{G}$ over a set of random variables X = {X_{1}, ..., X_{ n }} and a set of functions Θ over the nodes and edges of $\mathcal{G}$. Together, they define the joint distribution P(X). The topology of the graph determines the set of conditional independencies between the variables. In particular, the ith random variable is conditionally independent of the remaining variables, given its neighbors in the graph. Informally, if variables X_{ i }and X_{ j }are not connected by an edge in the graph, then any correlation between them is indirect. By 'indirect' we mean that the correlation between X_{ i }and X_{ j }(if any) can be explained in terms of a pathway of correlations (e.g., X_{ i }→ X_{ k }→ ··· → X_{ j }). Conversely, if X_{ i }and X_{ j }are connected by an edge, then the correlation is direct. Our algorithm automatically detects these conditional independencies and learns the sparsest possible model, subject to fitting the data.
Gaussian Graphical Models
A Gaussian Graphical Model (GGM) or Gaussian Markov Random Field is simply a MRF where each variable is normally distributed. Thus, a GGM encodes a multivariate Gaussian distribution. A GGM has parameters $\mathcal{M}=\left(\overrightarrow{h},{\sum}^{1}\right)$ where Σ^{1} is an n × n matrix (known as the precision matrix) and $\overrightarrow{h}$ is a n × 1 vector. The nonzero elements of Σ^{1} reveal the edges in the MRF. The inverse of the precision matrix, denoted by Σ, is the covariance matrix for a multivariate Gaussian distribution with mean $\overrightarrow{\mu}={\overrightarrow{h}}^{T}\sum $.
Here, $\sum =\left(\begin{array}{cc}\hfill {\sum}_{WW}\hfill & \hfill {\sum}_{WV}\hfill \\ \hfill {\sum}_{WV}^{T}\hfill & \hfill {\sum}_{VV}\hfill \end{array}\right)$. Thus, inference can be performed analytically via matrix operations. In this way, users can predict the conformational changes induced by local perturbations or, more generally, study the couplings between arbitrarily chosen subsets of variables.
Algorithms
We now present three algorithms for learning various kinds of generative models from MD data.
Input The input to all three algorithms is a timeseries of vectors $\mathbf{D}=\left({\overrightarrow{\mathbf{d}}}_{1},...,{\overrightarrow{\mathbf{d}}}_{t}\right)$ where ${\overrightarrow{\mathbf{d}}}_{i}$ is a n × 1 vector of covariates (e.g., positional and/or angular deviations) and t is the number of snapshots in the MD trajectory.
Algorithm 1
Output The first algorithm produces a Gaussian Graphical Model $\mathcal{M}=\left(\overrightarrow{h},{\sum}^{1}\right)$. The first step is to compute the sample mean $\overrightarrow{\mu}=1\u2215t{\sum}_{i=1}^{t}{\overrightarrow{\mathbf{d}}}_{i}$. Then it computes the regularized precision matrix Σ^{1} (see below). Finally, $\overrightarrow{h}$ is computed as follows: $\overrightarrow{h}={\sum}^{1}\overrightarrow{\mu}$.
The algorithm produces the sparsest precision matrix that still fits the data (see below). It also guarantees that Σ^{1} is positivedefinite, which means it can be inverted to produce the regularized covariance matrix (as opposed to the sample covariance, which is trivial to compute). This is important because Eqs 13 require the covariance matrix, Σ. We further note that a sparse precision matrix does not imply that the corresponding covariance matrix is sparse, nor does a sparse covariance imply that the corresponding precision matrix is sparse. That is, our algorithm isn't equivalent to simply thresholding the sample covariance matrix, and then inverting.
Learning regularized precision matrices
A straightforward way of learning a GGM is to find the parameters ($\left(\overrightarrow{\mu},\sum \right)$) that maximize the likelihood of the data (i.e., by finding parameters that maximize ${\sum}_{i=1}^{t}P\left({\overrightarrow{d}}_{i}\right))$. It is known that a maximum likelihood model can be produced by setting the pair $\left(\overrightarrow{\mu},\sum \right)$ to the sample mean and covariance matrices, respectively. Unfortunately, maximum likelihood estimates can be prone to overfitting. This is not surprising because the covariance matrix alone contains m = O(n^{2}) parameters, each of which must be estimated from the data. This is relevant because the number of independent samples needed to obtain a statistically robust estimate of Σ grows polynomially in m. We note that while modern MD simulations do produce large numbers of samples (i.e., frames), these samples are not independent (because they form a timeseries), and so the effective sample size is much smaller than the number of frames in the trajectory.
Here, Σ^{1}_{1} is the L_{1} norm of the precision matrix. The L_{1} norm is defined as the sum of the absolute values of the matrix elements. It can be interpreted as a measure of the complexity of the model. In particular, each nonzero element of Σ^{1} corresponds to a parameter in the model and must be estimated from the data. Thus, Eq. 6 establishes a tradeoff between the log likelihood of the data (the first term) and the complexity of the model (the second term). The scalar value λ controls this tradeoff such that higher values produce sparser precision matrices. This is our algorithm's only parameter. Its value can be computed analytically [27] from the number of frames in the trajectory and variables. Alternatively, users may elect to adjust λ to obtain precision matrices of desired sparsity.
We note that L_{1} regularization is equivalent to maximizing the likelihood under a Laplace prior and so the solution to Eq. 7 is a maximum a posteriori (MAP) estimate of the true precision matrix, as opposed to a maximum likelihood estimate. That is, our algorithm is a Bayesian method. Moreover, the use of L_{1} regularization ensures additional desirable properties including consistency  given enough data, the learning procedure learns the true model, and high statistical efficiency  the number of samples needed to achieve this guarantee is small.
That is, the optimal Σ^{1} is the one that maximizes the worst case log likelihood over all additive perturbations of the covariance matrix.
such that Σ^{1} = (S + U*)^{1}.
Eq. 9 is smooth and convex, and for small values of n it can be solved by standard convex multivariate optimization techniques, such as the interior point method. For larger values of n we use Block Coordinate Descent [27] instead.
Block Coordinate Descent
Given matrix A, let A_{\k\j}denote the matrix produced by removing column k and row j of the matrix. Let A_{ j }also denote the column j, with diagonal element A_{ jj }removed. The Block Coordinate Descent algorithm [27]. Algorithm 1 proceeds by optimizing one row and one column of the variable matrix W at a time. The algorithm iteratively optimizes all columns until a convergence criteria is met. The W s produced in each iterations are strictly positive definite and so the regularized covariance matrix Σ = W is invertible.
Algorithm 1 Block Coordinate Descent
Require: Tolerance parameter ε, sample covariance S, and regularization parameter λ.
Initialize W^{(0)}:= S + λI where I is the identity matrix.
repeat
for j = 1,... n do
$y*=arg\underset{y}{min}\left\{{y}^{T}{\mathbf{W}}_{\backslash j\backslash j}^{\left(j1\right)}y:\parallel y{\mathbf{S}}_{j}{\parallel}_{\infty}\le \lambda \right\}$ {//Here, W^{(j1)}denotes the current iterate.}
Set W^{(j)}to W^{(j1)}such that W_{ j }is replaced by y*.
end for
Set W^{(0)} = W^{(n)}
until trace((W^{(0)})^{1}S)  n + λ(W^{(0)})^{1}_{1} ≤ ε.
return W^{(0)}
The time complexity of this algorithm is O(n^{4.5}/ε) [27] when converging to a solution within ε > 0 of the optimal. This complexity is better than $O\left({n}^{6}\u2215log\left(\frac{1}{\epsilon}\right)\right)$, which would have been achieved using the interior point method on the dual form [28].
In summary, the algorithm produces a timeaveraged model of the data by computing the sample mean and then constructing the optimal regularized Σ by solving Eq. 9 using Block Coordinate Decent. The regularized covariance matrix Σ is guaranteed to be invertible which means we can always compute the precision matrix, Σ^{1}, which can be interpreted as a graph over the variables revealing the direct and indirect correlations between the variables.
Algorithm 2
The second algorithm is a straightforward extension of the first. Instead of producing a timeaveraged model, it produced timevarying model: $\mathcal{M}\left(\tau \right)=\left(\overrightarrow{h}\left(\tau \right),{\sum}^{1}\left(\tau \right)\right)$. Here, τ ≤ t indexes over sequentially ordered windows of frames in the trajectory. The width of the window, w, is a parameter and may be adjusted to learn timevarying models at a particular timescale. Naturally, a separate timeaveraged model could be learned for each window. Instead, the second algorithm applies a simple smoothing kernel so that the parameters of the τ th window includes information from neighboring window too. In this way, the algorithm ensures that the parameters of the timevarying model evolve as smoothly as possible, subject to fitting the data.
where k indexes over windows τ  κ to τ + κ, κ is a userspecified kernel width, and the weights w_{ k }are defined by a nonnegative kernel function. The choice of kernel function is specified by the user. In our experiments the kernel mixed the current window and the previous window with the current window having twice the weight of the previous. The timevarying model is then constructed by solving Eq. 9 for each S(τ). That is, the primary difference between the timeaveraged and timevarying version of the algorithm is the kernel function.
Algorithm 3
The final algorithm builds on the second algorithm. Recall that the second algorithm learns T sequentially ordered models over windows of the trajectory. Moreover, recall that each model encodes a multivariate Gaussian (Eq. 1) and that the KLdivergence between multivariate Gaussians can be computed analytically via Eq. 3. The KLdivergence (also known as information gain or relative entropy) is a nonnegative measure of the difference between two probability distributions. It is zero if and only if the two distributions are identical. It is not, however, a distance metric because it is not symmetric. That is D(PQ) ≠ D(QP), in general. However, it is common to define a symmetric KLdivergence by simply summing KL_{ sym }= D(PQ)+D(QP). We can thus cluster the models using any standard clustering algorithm, such as kmeans or a hierarchial approach. In our experiments we used complete linkage clustering, an agglomerative method that minimizes the maximum distance between elements when merging clusters.
Let S be the set of clusters returned by a clustering algorithm. Our final algorithm treats those clusters as states in a Markov Chain. The prior probability of being in each state can be estimated using free energy calculations [29, 30] for each cluster, or according to the relative sizes of each cluster. It then estimates the transition probabilities between states i and j by counting the number of times a model assigned to cluster i is followed by a model assigned to cluster j. This simple approach creates a model that can be used to generate new trajectories by first sampling states from the Markov Chain and then sampling conformations from the models associated with that state.
Experiments
We applied our algorithms to several molecular dynamics simulation trajectories. In this section, we illustrate some of the results obtained through this analysis. The algorithms were implemented in Matlab and run on a dual core T9600 Intel processor running at 2.8 Ghz. The wallclock runtimes for all the experiments were on the order of seconds to about 10 minutes, depending on the size of the data set and parameter settings.
Algorithm 1: application to the early events of HIV entry
Ibalizumab's mechanism of action is poorly understood. As can be seen in Figure 1, Ibalizumab does not prevent gp120 from binding to CD4, nor does it directly bind to gp120 itself, suggesting that its inhibitory action occurs via an allosteric mechanism. To investigate this phenomenon, we applied our first algorithm to the two trajectories and then compared the resulting models. The variables in the models corresponded to the positional fluctuations of the Cα atoms, relative to the initial frame of the simulation.
Correlation networks
Loglikelihood $\left(\mathcal{L}\mathcal{L}\right)$ of the gp120CD simulations under both models
Data  $\mathcal{L}\mathcal{L}$(DataUnbound Model)  $\mathcal{L}\mathcal{L}$(DataDrug Bound Model) 

Unbound  0.03  0.19 
Bound  0.04  0.29 
Comparison to suboptimal models
Perturbation analysis
Algorithm 2: application to a 1 microsecond simulation of the engrailed homeodomain
We performed three 50microsecond simulations of the protein at 300, 330, and 350 degrees Kelvin. These simulations were performed on ANTON[14], a specialpurpose supercomputer designed to perform longtimescale simulations. Each simulation had more than 500,000 frames. In this paper, we learned a timevarying model of the first microsecond of the 300 degree trajectory, modeling the fluctuations of the alpha carbons. The window size was 2 ns, and a sawtooth smoothing kernel was applied such that the i th model is built from the data from windows i, i 1, and i  2 such with kernel weights 0.57, 0.29, and 0.14, respectively. A total of 500 models were learned from the first microsecond of the trajectory.
Algorithm 3: application to a 1 microsecond simulation of the engrailed homeodomain
Using the 500 models learned in the previous section, we computed the symmetric KLdivergence between all pairs of models. Recall that the KLdivergence (Eq. 3) is a measure of the difference between distributions. Figure 9B plots the pairwise KL divergences between the 500 models.
Discussion
Many existing techniques for analyzing MD data are closely related to, or direct applications of Principal Components Analysis (PCA). QuasiHarmonic Analysis (QHA) [18, 19], for example, is PCA applied to a massweighted covariance matrix of atomic fluctuations. PCAbased methods diagonalize the covariance matrix and thus produce a set of eigenvectors and corresponding eigenvalues. Each eigenvector can be interpreted as one of the principal modes of vibration within the system or, equivalently, as a normally distributed random variable with zero mean and variance proportional to the corresponding eigenvalue. That is, PCAbased methods model the data in terms of a multivariate Gaussian distribution. Our methods also build multivariate Gaussian models of the data but does so over the realspace variables, not the eigenspace variables.
PCAbased methods generally project the data onto a lowdimensional subspace spanned by the eigenvectors corresponding to the largest eigenvalues. This is done to simplify the data and because lower dimensional models tend to be more robust (i.e., less likely to overfit the data). Our methods, in contrast, uses regularization when estimating the parameters of the model to achieve the same goals.
The eigenvectors produced by PCAbased methods contain useful information about how different regions of the system move in a coordinated fashion. In particular, the components of each vector quantify the degree of coupling between the covariates in that mode. However, the eigenvectors make no distinction between direct and indirect couplings. Moreover, eigenvectors are an inherently global description of dynamics. Our methods, in contrast, do not perform a change of basis and instead models the data in terms of a network of correlations. The resulting model, therefore, reveals which correlations are direct and which are indirect. Pathways in these networks may provide mechanistic insights into important phenomena, such as allosteric regulation. Our models can also be used to investigate motions that are localized to specific regions of the system.
Finally, we note that because our first algorithm produces a regularized estimate of the true covariance matrix, Σ, it could potentially be used as a preprocessing step for PCAbased methods, which normally take as input the sample covariance matrix.
Conclusions and future work
We have introduced three novel methods for analyzing Molecular Dynamics simulation data. Our algorithms learn regularized graphical models of the data which can then be used to: (i) investigate the networks of correlations in the data; (ii) sample novel configurations; or (iii) perform in silico perturbation studies. We note that our methods are complementary to existing analysis techniques, and are not intended to replace them.
There are a number of important areas for future research. Gaussian Graphical Models have a number of limitations, most notably that they encode unimodal distributions and are best suited to modeling harmonic motions. Boltzmann distributions, in contrast, are usually multimodal. Our third algorithm partially addresses this problem by creating a Markov chain over GGMs but the motions are still harmonic. Discrete distributions could be used to model anharmonic motions (e.g., by adapting the algorithm in [24]). Gaussian distributions are also best suited to modeling variables defined on the realline. Angular variables, naturally, are best modeled with circular distributions, like the von Mises. We've recently developed an algorithm for learning multivariate von Mises graphical models [25] which could be used to model distributions over bond and dihedral angles.
List of abbreviations used
 GGM:

Gaussian Graphical Model
 KL:

Kullback Leibler
 MAP:

maximum a posteriori
 MD:

Molecular dynamics
 MRF:

Markov Random Field
 MSM:

Markov State Model
 PCA:

Principal Components Analysis
 QHA:

QuasiHarmonic Analysis.
Declarations
Acknowledgements
This work is supported in part by US NSF grant IIS0905193. Use of the Anton machine was provided through an allocation from National Resource for Biomedical Supercomputing at the Pittsburgh Supercomputing Center via US NIH RC2GM093307.
This article has been published as part of BMC Genomics Volume 13 Supplement 1, 2012: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/14712164/13?issue=S1.
Authors’ Affiliations
References
 Frauenfelder H, Petsko GA, Tsernoglou D: Temperaturedependent Xray diffraction as a probe of protein structural dynamics. Nature. 1979, 280 (5723): 558563. 10.1038/280558a0.View ArticlePubMedGoogle Scholar
 Frauenfelder H, Parak F, Young RD: Conformational substates in proteins. Annu Rev Biophys Biophys Chem. 1988, 17: 451479. 10.1146/annurev.bb.17.060188.002315.View ArticlePubMedGoogle Scholar
 HenzlerWildman K, Kern D: Dynamic personalities of proteins. Nature. 2007, 450: 964972. 10.1038/nature06522.View ArticlePubMedGoogle Scholar
 Boehr DD, Nussinov R, Wright PE: The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol. 2009, 5 (11): 789796. 10.1038/nchembio.232.PubMed CentralView ArticlePubMedGoogle Scholar
 Fraser J, Clarkson M, Degnan S, Erion R, Kern D, Alber T: Hidden alternative structures of proline isomerase essential for catalysis. Nature. 2009, 462 (7273): 669673. 10.1038/nature08615.PubMed CentralView ArticlePubMedGoogle Scholar
 Eisenmesser EZ, Bosco DA, Akke M, Kern D: Enzyme dynamics during catalysis. Science. 2002, 295 (5559): 15201523. 10.1126/science.1066176.View ArticlePubMedGoogle Scholar
 Eisenmesser EZ, Millet O, Labeikovsky W, Korzhnev D, M WW, Bosco D, Skalicky J, Kay L, Kern D: Intrinsic dynamics of an enzyme underlies catalysis. Nature. 2005, 438: 117121. 10.1038/nature04105.View ArticlePubMedGoogle Scholar
 Leitner DM: Energy flow in proteins. Annu Rev Phys Chem. 2008, 59: 233259. 10.1146/annurev.physchem.59.032607.093606.View ArticlePubMedGoogle Scholar
 Karplus M, McCammon JA: Molecular dynamics simulations of biomolecules. Nat Struct Biol. 2002, 9: 646652. 10.1038/nsb0902646.View ArticlePubMedGoogle Scholar
 Philips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale LV, Schulten K: Scalable molecular dynamics with NAMD. J Comput Chem. 2005, 26 (16): 17811802. 10.1002/jcc.20289.View ArticleGoogle Scholar
 Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL, Kolossvary I, Moraes MA, Sacerdoti FD, Salmon JK, Shan Y, Shaw DE: Scalable algorithms for molecular dynamics simulations on commodity clusters. SC '06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. 2006, New York, NY, USA: ACM, 8496. [http://dx.doi.org/10.1145/1188455.1188544]View ArticleGoogle Scholar
 Pande VS, Baker I, Chapman J, Elmer SP, Khaliq S, Larson SM, Rhee YM, Shirts MR, Snow C, Sorin EJ, Zagrovic B: Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. Biopolymers. 2003, 68: 91109. 10.1002/bip.10219.View ArticlePubMedGoogle Scholar
 Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG, Schulten K: Accelerating molecular modeling applications with graphics processors. J Comput Chem. 2007, 28: 26182640. 10.1002/jcc.20829.View ArticlePubMedGoogle Scholar
 Shaw DE, Deneroff MM, Dror RO, Kuskin JS, Larson RH, Salmon JK, Young C, Batson B, Bowers KJ, Chao JC, Eastwood MP, Gagliardo J, Grossman JP, Ho CR, Ierardi DJ, Kolossv'ary I, Klepeis JL, Layman T, McLeavey C, Moraes MA, Mueller R, Priest EC, Shan Y, Spengler J, Theobald M, Towles B, Wang SC: Anton, a specialpurpose machine for molecular dynamics simulation. SIGARCH Comput. Archit News. 2007, 35: 112.View ArticleGoogle Scholar
 Shao J, Tanner S, Thompson N, Cheatham T: Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms. J Chem Theory Comput. 2007, 3 (6): 23122334. 10.1021/ct700119m.View ArticleGoogle Scholar
 Frickenhaus S, Kannan S, Zacharias M: Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping. J Comput Chem. 2009, 30 (3): 479492. 10.1002/jcc.21076.View ArticlePubMedGoogle Scholar
 Daura X, van Gunsteren WF, Mark AE: Foldingunfolding thermodynamics of a betaheptapeptide from equilibrium simulations. Proteins. 1999, 34 (3): 269280. 10.1002/(SICI)10970134(19990215)34:3<269::AIDPROT1>3.0.CO;23.View ArticlePubMedGoogle Scholar
 Karplus M, Kushick JN: Method for estimating the configurational entropy of macromolecules. Macromolecules. 1981, 14 (2): 325332. 10.1021/ma50003a019.View ArticleGoogle Scholar
 Levy RM, Srinivasan AR, Olson WK, McCammon JA: Quasiharmonic method for studying very low frequency modes in proteins. Biopolymers. 1984, 23: 10991112. 10.1002/bip.360230610.View ArticlePubMedGoogle Scholar
 Berendsen HJ, Hayward S: Collective protein dynamics in relation to function. Curr Opin Struct Biol. 2000, 10 (2): 165169. 10.1016/S0959440X(00)000610.View ArticlePubMedGoogle Scholar
 Ramanathan A, Agarwal PK, Kurnikova M, Langmead CJ: An online approach for mining collective behaviors from molecular dynamics simulations. J Comput Biol. 2010, 17 (3): 309324. 10.1089/cmb.2009.0167.View ArticlePubMedGoogle Scholar
 Ramanathan A, Yoo J, Langmead C: Onthefly identification of conformational substates from molecular dynamics simulations. J Chem Theory Comput. 2011, 7 (3): 778789. 10.1021/ct100531j.View ArticleGoogle Scholar
 Lange OF, Grubmüller H: Full correlation analysis of conformational protein dynamics. Proteins. 2008, 70 (4): 12941312. 10.1002/prot.21618.View ArticlePubMedGoogle Scholar
 Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ: Learning generative models for protein fold families. Proteins. 2011, 79 (4): 10611078. 10.1002/prot.22934.View ArticlePubMedGoogle Scholar
 Razavian N, Kamisetty H, Langmead C: The von Mises graphical model: regularized structure and parameter learning. Tech Rep CMUCS11108, Carnegie Mellon University, Department of Computer Science. 2011Google Scholar
 Bowman GR, Beauchamp KA, Boxer G, Pande VS: Progress and challenges in the automated construction of Markov state models for full protein systems. J Chem Phys. 2009, 131 (12): 12410110.1063/1.3216567.PubMed CentralView ArticlePubMedGoogle Scholar
 Banerjee O, El Ghaoui L, d'Aspremont A: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res. 2008, 9: 485516.Google Scholar
 Vandenberghe L, Boyd S, Wu SP: Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications. 1998, 19: 499533. 10.1137/S0895479896303430.View ArticleGoogle Scholar
 Kamisetty H, Xing EP, Langmead CJ: Free energy estimates of allatom protein structures using generalized belief propagation. J Comput Biol. 2008, 15 (7): 755766. 10.1089/cmb.2007.0131.View ArticlePubMedGoogle Scholar
 Kamisetty H, Ramanathan A, BaileyKellogg C, Langmead C: Accounting for conformational entropy in predicting bidning free energies of proteinprotein interactions. Proteins. 2011, 79 (2): 444462. 10.1002/prot.22894.View ArticlePubMedGoogle Scholar
 Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kalé L, Schulten K: Scalable molecular dynamics with NAMD. J Comput Chem. 2005, 26: 17811802. 10.1002/jcc.20289.PubMed CentralView ArticlePubMedGoogle Scholar
 Jacobson JM, Kuritzkes DR, Godofsky E, DeJesus E, Larson JA, Weinheimer SP, Lewis ST: Safety, pharmacokinetics, and antiretroviral activity of multiple doses of ibalizumab (formerly TNX355), an antiCD4 monoclonal antibody, in human immunodeficiency virus type 1infected adults. Antimicrob Agents Chemother. 2009, 53 (2): 450457. 10.1128/AAC.0094208.PubMed CentralView ArticlePubMedGoogle Scholar
 Toma J, Weinheimer SP, Stawiski E, Whitcomb JM, Lewis ST, Petropoulos CJ, Huang W: Loss of asparaginelinked glycosylation sites in variable region 5 of human immunodeficiency virus type 1 envelope is associated with resistance to CD4 antibody ibalizumab. J Virol. 2011, 85 (8): 38723880. 10.1128/JVI.0223710.PubMed CentralView ArticlePubMedGoogle Scholar
 Gehring W, Affolter M, Burglin T: Homeodomain proteins. Annu Rev Biochem. 1994, 63: 487526. 10.1146/annurev.bi.63.070194.002415.View ArticlePubMedGoogle Scholar
 D'Elia AV, Tell G, Paron I, Pellizzari L, Lonigro R, Damante G: Missense mutations of human homeoboxes: a review. Hum Mutat. 2001, 18: 361374. 10.1002/humu.1207.View ArticlePubMedGoogle Scholar
 Gehring W, Qian Y, Billeter M, Furukubotokunaga K, Schier A, Resendezperez D, Affolter M, Otting G, Wuthrich K: HomeodomainDNA recognition. Cell. 1994, 78: 211223. 10.1016/00928674(94)902925.View ArticlePubMedGoogle Scholar
 Mayor U, Grossmann JG, Foster NW, Freund SM, Fersht AR: The denatured state of engrailed homeodomain under denaturing and native conditions. J Mol Biol. 2003, 333: 977991. 10.1016/j.jmb.2003.08.062.View ArticlePubMedGoogle Scholar
 Mayor U, Johnson CM, Dagget V, Fersht AR: Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc Natl Acad Sci U S A. 2000, 97: 1351813522. 10.1073/pnas.250473497.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.