The laws of averages

February 7, 2012 § 3 Comments

The Hitchhiker’s Guide to the Galaxy, that truly remarkable book, points out that since the area of the universe is infinite and the number of populated worlds is finite, the population of the universe is, on average, none.  So although you might see people from time to time, they are most likely merely products of your imagination.  Arguing from averages is always tricky; many people in the department are fixated on the question of what happens when the average is not a good surrogate for what’s happening to the individual, as for example when there are two populations behaving in distinct ways and the average captures neither behavior.  But a recent paper argues that there is quite a lot you can deduce about the physical limits to cell behavior by knowing the average behavior of the proteins that make up the cell (Dill, Ghosh and Schmit 2011.  Physical limits of cells and proteomes.  PNAS doi/10/1073/pnas.1114477108).  Actually the average alone is not enough: you need to know the distribution around the average as well.

The argument goes like this.  Because the mass of a cell is (on average, and excluding water) about 50% protein, the physical properties of the mixture of proteins that make up the proteome are likely to be important in dictating the physical properties of the cell itself.  You might think this is a rather unhelpful idea: if you need to measure the properties of individual proteins one by one and average them all together to determine the overall behavior of the proteome, then it may be easier to measure the physical properties of the cell directly.  But it turns out that many physical properties of proteins depend strongly on their length.  For example, the free energy of folding of a protein is directly correlated to the number of amino acids it’s made up of (let us, creatively, call this number N).  While the details of the structure of the protein — secondary structure, the number of hydrophobic amino acids, the number of salt bridges, etc. — may be important for individual proteins, on average these details appear to have only a minor effect.  This means that you can, in principle at least, figure out quite a lot about how a cell’s proteome responds to heat by simply knowing the relationship between N and folding free energy, and the average and distribution of N.  Which, in principle, you can get from genomic information.  Similarly, if you assume that proteins are in general globular, then the overall size of a protein depends fairly straightforwardly on N.  That means that the rate of diffusion of a protein also depends on N.  And if you know the distribution of N for a cell’s proteome, and the size of the cell, you also know something about the density of the intracellular environment.

So Dill et al. are suggesting, among other things, that you should be able to use sequence databases to predict the response of different cells to heat shock.  They go further than simply suggesting that it should be possible: they set out to do it.  First, they needed to figure out the relationship between N and the free energy of folding, ΔG.  Since the free energy of folding of a given protein must be dependent on temperature, T, they use T as a variable as well.  They use literature measurements of ΔG for 116 proteins to create two different approximations for the ΔG/N/T relationship, one for proteins from mesophiles (those of us who like to live at moderate temperatures) and the other for proteins from thermophiles (those who like to think they’re hot, and live at 45ºC or above).

Having done this, all we need to know is N to be able to determine ΔG for any given temperature.  Using the mean and variance of protein chain lengths in the organism’s proteome, predicted from genome sequence information, you can get an approximation for this too.  By putting the two equations together (mesophile ΔG/N equation with N distributions from mesophiles, and thermophile ΔG/N equation with N distributions from thermophiles, naturally), Dill et al. can then produce an estimate for the distribution of stability of proteins in a given proteome.

This is already interesting because the thermophile protein stability equation is different from the mesophile equation — so ΔG depends not only on N but also on the class of organism.  And Dill et al. note that it isn’t entirely clear where the difference comes from.  Nevertheless, within each class of organisms there seems to be a reasonable linear relationship between ΔG and N.  So let’s just assume that all mesophile proteins behave the same way as each other, and take a look at a plot of the number of proteins versus stability in the genome of the biologist’s favorite organism, E. coli, at 37°C.  It has a pronounced skew and looks like this:

Figure 2 from Dill et al.

What this shows is that although the average protein is predicted to be fairly stable at 37°C (with a free energy of folding of about 6.8 kcal/mol), there are a few hundred proteins that are predicted to be only marginally stable (free energy of folding < 3 kcal/mol). So for E. coli, even a small change in temperature — say 4°C — would be predicted to destabilize about 16% of the proteins in the proteome.  Which would be bad; misfolded proteins are a problem, as we’ve discussed before. But just how bad would it be?

« Read the rest of this entry »

When the real world is more rational than the model

February 1, 2011 § Leave a comment

Those of you who stop by regularly have probably noticed that I rarely write about papers that I don’t think are particularly good.  This may be partly a lingering result of early training (“If you can’t say anything nice, don’t say anything at all”), but is mostly because better papers are more interesting to write about.  But a recent paper caught my eye because of the discrepancy between its title — Erratic Flu Vaccination Emerges From Short-Sighted Behavior in Contact Networks — and its actual finding, which might have been summarized as Erratic Flu Vaccination Would Emerge From Short-Sighted Behavior Under Certain Assumptions About The Way People Understand Risk But In Fact This Doesn’t Reflect Real-World Data So Something Else Must Be Going On.

Let’s face it, this is not an area I know much about: my interest is far more personal than professional.  I’m sure this paper makes a perfectly reasonable contribution to its field. And I’m sure that my alternative title would have been frowned upon by the journal, so it’s not entirely the authors’ fault that I found their title misleading. Nevertheless, having been fooled into reading the paper, I found myself disagreeing with the underlying assumptions enough to want to tell you about it and see what you think.  Modeling is all about forcing you to be clear about your assumptions and finding out where that set of assumptions leads you; but that doesn’t mean you have to publish every such exploration.

This is a theoretical paper that aims to address the important question of what determines vaccination rates in a population.  If the vaccine isn’t mandated by your government, then the vaccination rate depends on individual decisions, which may be influenced by individual perceptions of costs/risks and benefits.  The “correct” decision for an individual (defined as maximizing benefit, minimizing cost) may not be the same as the “correct” decision for a population; if individuals think that enough of their friends are being vaccinated that the risk of an infection in their circle is very small, the theory goes, they may choose not to get vaccinated and avoid the costs and risks, while gaining the benefits.   This is the kind of calculation that many parents in the UK apparently went through in deciding to avoid giving their children the MMR vaccine (measles, mumps and rubella) after the perceived risks of the vaccine were sharply increased by the scare engineered by Andrew Wakefield, who has now been comprehensively proven to be a self-interested fraud (if you haven’t read all of the BMJ articles by Brian Deer, do; they’re amazingly detailed and well documented).  As a result of the drop in MMR vaccination rates in the UK, measles is back with a vengeance.  Similar things have happened with whooping cough vaccine, polio vaccine, and many others.

« Read the rest of this entry »

Not sisters, under the skin

January 12, 2011 § 2 Comments

This post was chosen as an Editor's Selection for

Ever since we’ve been able to look at the internal components of cells, we’ve become more and more fascinated by the fact that individual cells are — well — individual.  Two genetically identical cells sitting next to each other in a dish may harbor very different sets of messenger RNAs and proteins (or other components), and may therefore respond differently to a stimulus.  We’ve seen this in the context of apoptosis in mammalian cells, and also bacterial responses to antibiotics.  Where do these differences come from?

The dominant story, so far, is that the variation in mRNA and protein levels we see is due to gene expression noise.  In a normal cell, you have at most 2 copies of any given gene, and each gene may be either in an “on” state for transcription (producing mRNA) or an “off” state.  This can produce bursts of mRNA production (think of the sudden stream of cars that comes through a traffic light when it turns green, then stops when the light turns red), which in turn can produce even larger bursts of protein production.  This will make protein levels fluctuate randomly over time, and different cells are not very likely to fluctuate in synchrony.  So there are bound to be differences between individual cells: noise is a fact of life.

But is gene expression noise the only source of fluctuations?  A recent paper (Huh and Paulsson, 2010, Non-genetic heterogeneity from stochastic partitioning at cell division, Nature Genetics doi:10.1038/ng.729) argues that cell-to-cell differences created at cell division when components are unequally split between the two daughters (called partitioning error) may be just as important as — or more important than — gene expression noise.

« Read the rest of this entry »

Where Am I?

You are currently browsing the Theory category at It Takes 30.