More models, better biochemistry
September 10, 2010 § Leave a comment
Peter Sorger, Will Chen and Mario Niepel have a new review out in Genes & Development, which looks to me as if it was only classified as a Review because the journal doesn’t have a category called Tutorial (Chen et al. 2010. Classic and contemporary approaches to modeling biochemical reactions Genes Dev. 24 1861-75 PMID: 20810646). It’s a very useful discussion of why and how to model, and it looks tailor-made for use in graduate-level courses.
Chen et al. start out by reminding us of the approximations we use every day and where they come from. The first is mass action kinetics, an approximation that allows us to use the idea of “concentration” but restricts us to situations where it’s reasonable to think of the species in a reaction as having a continuous distribution, in a well-mixed setting, where there is not much fluctuation in either the number of molecules available to react with each other, or the number of interactions between them. This covers a good deal of eukaryotic biology, but not all of it.
which applies only under a rather restrictive set of conditions. The first assumption we have to be wary of is the idea that the enzyme-substrate complex (C) is in equilibrium; just enough C is formed by the binding of enzyme (E) and substrate (S) to compensate for the amount of C that breaks down to form either E and product (P) or E and unreacted S. This is never precisely true, and not always even approximately true; the amount of C could change considerably during the course of the reaction. What makes it a good assumption for many enzymes is that the catalytic step, the formation and subsequent release of P, is often slow relative to the formation of C. So you can approximate the situation as E + S forming an equilibrium amount of C, which is (sort of) constant; when a P is produced, which (crucially) doesn’t happen often, the released E will pick up an S at the right rate to re-form the equilibrium level of C. C is treated as a constant that depends only on the behavior of E and S; and P depends only on the constant C. This very simple picture makes possible the derivation of the Michaelis-Menten equation. We use this approximation to relate the initial rate of reaction, V(0) (the initial rate of appearance of the product) to the initial concentration of substrate, and thus to determine V(max) and K(M), basic bits of information about how the enzyme behaves.
Chen et al. re-cast the classical enzyme-substrate reaction used by Michaelis and Menten as a dynamical system with all the relevant ordinary differential equations (ODEs). I can’t and won’t go through the entire argument; it’s a tutorial, so you should read and digest it yourself. Suffice it to say that the authors show — really relatively painlessly — how the classical Michaelis-Menten equation falls out of the dynamics of the ODEs, providing that E + S —> C really is much faster than C —> product. But the same analysis shows that you can’t use Michaelis-Menten if this separation of timescales isn’t true. And it’s easy to see that the parameter values that permit us to use Michaelis-Menten are only a small subset of the biologically reasonable parameters. For example, if the level of E is similar to the level of S, the Michaelis-Menten approximation is quite poor. So, the Michaelis-Menten approximation is a special case of the ODE model.
In a way this shouldn’t need saying. Every textbook, and even Wikipedia, clearly points out that the assumptions are only assumptions, and that if they’re not met the Michaelis-Menten approximation doesn’t hold. Chen et al. gently marvel at the fact that many people — although understanding perfectly well that Michaelis-Menten is an approximation — nevertheless put more credence in this approximation than in the ODE models that describe the same enzyme-substrate-product system in a much more general way. They comment that “in constructing models of cellular biochemistry we” — by which I think they mean “you” — “often find ourselves struggling to use K(M) measurements when estimates of elementary rate constants would be much more helpful”.
Which is a nice segue to the next section, in which they discuss how to infer the parameter values that you will need for ODE models. The seductive thing about Michaelis-Menten, and the reason it gets used even when people know it’s not strictly appropriate, is that it links what you can easily measure — initial reaction rate in the presence of a given amount of substrate — to what you want to know, in this case a couple of simple numbers that allow you to predict the overall kinetic behavior of an enzyme. If you can directly measure the amount of product, that is. But the more complicated the process you’re trying to understand, the more likely it is that you will have to find a formal way to match what you’re observing to what you think is going on. You’ll find yourself assuming a specific topology for the network you want to understand, calculating parameters for each of the essential reactions, and trying to determine how well the combination of model plus parameters fits the data you’ve obtained. Chen et al. explain how to use an objective function to identify the parameters that best fit your data. The approach used is to minimize the square of the difference between your data and what your model/parameter combination predicts; the squaring is just to avoid the possibility that a bad fit where the difference is negative might cancel out a bad fit where the difference is positive.
This process of parameter-fitting is spiritually (and for all I know mathematically) similar to the way that structural biologists determine how well their model of the positions of amino acids fits the actual pattern of X-ray diffraction that they obtained from their crystal. Experimental error, unfortunately, is more of a confounding factor for parameter-fitting than for X-ray crystallography (at least when the crystals are good). Chen et al. derive the mountainous landscape of goodness-of-fit versus parameter value for a simple model reaction, and show that there are directions in which you can have quite a lot of certainty about the value of a parameter (when you are going up the side of one of the mountains), and other directions where you find many parameters that fit the data equally well (along a valley between two mountains). What you’re really doing in performing a parameter-fit is identifying a range of parameters that could fit your data, and estimating the probability of each one being the “true” parameter. Another way of being in the cloud.
Even for a simple model reaction, the likelihood plot for parameter variation is quite complex, and it isn’t intuitively obvious why. The authors point out that this is a sobering discovery, given the “prevalence of informal thinking in molecular biology and the common assumption that moving from data to an understanding of the underlying biochemistry is straightforward”. But it’s understandable that few biologists want to be more formal in their thinking: the tools currently available are not particularly user-friendly and don’t make it easy to pull out what’s important from the wealth of irrelevant detail. The situation has been compared to working in machine language, when what we really want to do is use Python. [Actually, what I want to do is use software someone else has already written, but I’m a Mac user, what can I say.]
This review is, at bottom, a call to action. The authors argue that the informal sketches and approximations of biochemical reactions we normally use are often seriously misleading, and that formal modeling is a far better way to understand how biochemistry actually works. We need better conceptual frameworks and tools to use modeling effectively; but when we have them, there will be a revolution in how we understand the behavior of protein networks, and therefore how we understand much of biology. So, get in line to pick out your piece of the problem: there’s lots to do, and even more to learn.
Chen WW, Niepel M, & Sorger PK (2010). Classic and contemporary approaches to modeling biochemical reactions. Genes & Development, 24 (17), 1861-75 PMID: 20810646