Estimation and Validation of Markov Models

This chapter describes approaches to estimate a Markov model transition matrices from simulation data that has been mapped to a discrete state space, and approaches to validate whether this estimate is consistent with the simulation data at hand.

  • PDF / 1,026,252 Bytes
  • 16 Pages / 504 x 720 pts Page_size
  • 9 Downloads / 241 Views

DOWNLOAD

REPORT


Estimation and Validation of Markov Models Jan-Hendrik Prinz, John D. Chodera, and Frank Noé

In this chapter, we discuss the problem of estimating the state-to-state transition matrix of a Markov model given a set of trajectory data and a discretization of configuration space, the selection of an appropriate lag time (or observation interval) τ , and validation of the resulting model to ensure it is consistent with the data used to construct it. We presume the trajectory data has been generated by one or more molecular dynamics simulations initiated from configurations sampled from either global equilibrium or a local equilibrium within one or more of the discretized conformational states. These states can be generated according to methods discussed in previous chapters. This chapter follows Ref. [20] which should be used for citation purposes.

4.1

Preliminaries: The Transition Count Matrix

For simplicity, we first consider the case of a single equilibrium simulation trajectory X consisting of N configurations sampled at a fixed time interval t,

J.-H. Prinz · F. Noé (B) Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany e-mail: [email protected] J.D. Chodera Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA

 X = x1 = x(t = 0), x2 = x(t = t), . . . ,   xN = x t = (N − 1)t . (4.1) The generalization to multiple trajectories will be discussed subsequently. We suppose we have already defined a crisp state space discretization {S1 , . . . , SK } where each structure can be assigned uniquely to a discrete state, xk ∈ Si ⇒ sk = i, k ∈ {1, . . . , N }, i ∈ {1, . . . , K} allowing the trajectory to be encoded as the sequence (s1 , . . . , sN ) of discrete states visited at times nt along the trajectory. We assume that the initial configuration x1 was drawn from μs1 (x), the equilibrium density within the initial state s1 . There are numerous strategies that can be used to sample from the initial distribution μs1 (x) for the purposes of initiating a simulation from s1 , such as sampling from a reweighted ensemble (e.g. generated by replicaexchange [24] or well-converged meta-dynamics [13] simulations) or utilizing a potential energy bias Ubias (x) = −kB T ln μi (x) to rapidly equilibrate a simulation within the state before removing the biasing potential to generate unbiased dynamical trajectories [21]. Note that in the limit of very small discrete states, this problem vanishes as μi (x) can then be well approximated by a step function (see the Supplementary Material for [17]). We can now define a state-to-state transition obs (τ )] at lag time τ , count matrix Cobs (τ ) = [cij

G.R. Bowman et al. (eds.), An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation, Advances in Experimental Medicine and Biology 797, DOI 10.1007/978-94-007-7606-7_4, © Springer Science+Business Media Dordrecht 2014

45

46

J.-H. Prinz et al. obs cij (τ ) obs = cij (lt)

=

(N −1)/ l−1

χi (x(l·k)+1 )χj (x(l·k)+l+1 ).

k=0

(4.5) Fig. 4.1 Transition countin