Cellular Morse Code

July 10, 2012 § Leave a comment

For decades now, the biological community has been focused on the question of how cells transmit information from place to place.  It’s a central problem if you want to understand pretty much anything about cell behavior.  A signal to grow, for example, might start when a growth factor arrives on the outside of a cell, say in your tissue culture dish when you add fresh medium with growth-factor-containing serum in it.  The information that it’s time to grow might be transmitted across the membrane by a membrane-spanning receptor, triggering a series of events such as a cascade of phosphorylations that cause enzymes within the cell to change activity.  The final result might be a change of activity of a transcription factor; the presence of a signal outside the cell has thus been converted into a change in the gene expression profile inside the nucleus of the cell.  We chiefly think of these processes as linear — a pathway — with a well-defined flow of information from A to B to C. We draw diagrams that show A near the cell membrane, passing information to B (closer to the nucleus) and then to C (closer still).  But of course this is just an analogy we use to make it easier for us to think about what’s going on, and like all convenient analogies it has the potential to be seriously misleading.  Our so-called “pathways” loop and branch and pass information forward and backward and sideways, losing precision all the way; A, B and C are most often distinguished by the timing of their activation, rather than by their location in the cell; and while it’s easy to tell a general story about how an external stimulus leads to a response inside the cell, it’s still hard to know why the response is the size it is, or happens at the time it does.

One of the most puzzling aspects of signal transduction is what happens when multiple signals impinge on the same mediator — when “paths” cross, or diverge, or merge.  In the case of the important anti-oncogene p53, we draw several paths coming in to p53 and several paths going out again.  The downstream consequences of p53 activation vary dramatically, from transient cell cycle arrest to senescence and apoptosis.  How does this single protein receive and transmit several different types of information?

One idea is that the p53 network is in fact many different, distinct pathways, each using a different p53 isoform (say, p53′, p53” and so on).  All these pathways look as if they overlap because they all involve an increase in the total level of p53 protein, but p53 can be modified in many ways (phosphorylation, acetylation, ubiquitination, methylation… ) at many different sites, producing modified versions of p53 that have varying functions.  It’s well established that this happens, and that the modifications do indeed modulate p53’s behavior.  But there’s another dimension, literally, to explore here: time.  Although activating the p53 pathway always causes p53 protein levels to increase — by definition — that doesn’t mean that the timing and duration of the response is always the same.  The role of protein dynamics in the transmission and processing of information in biology is seriously under-explored.

Here’s a dramatic example: exposure to gamma radiation, which causes double-strand breaks in DNA, leads to repeated individual pulses of p53 that have a stereotyped size and shape and appear at defined intervals.  Increasing the dose of radiation doesn’t increase the average size of the pulses; instead, it increases the number of pulses.  Irradiation with ultraviolet light also causes damage to DNA, but this time the breaks are primarily single-stranded.  The response of p53 to UV is quite different from its response to gamma.  Instead of repeated pulses of unchanging average size, you get a single wave whose size varies depending on the amount of irradiation: the bigger the radiation dose, the bigger the wave.  But what do these differences mean?  The Lahav lab has been pursuing this question pretty much ever since the lab began, and now they think they have an answer (Purvis et al. (2012) p53 dynamics control cell fate. Science doi:10.1126/science.1218351). 

« Read the rest of this entry »

The laws of averages

February 7, 2012 § 3 Comments

The Hitchhiker’s Guide to the Galaxy, that truly remarkable book, points out that since the area of the universe is infinite and the number of populated worlds is finite, the population of the universe is, on average, none.  So although you might see people from time to time, they are most likely merely products of your imagination.  Arguing from averages is always tricky; many people in the department are fixated on the question of what happens when the average is not a good surrogate for what’s happening to the individual, as for example when there are two populations behaving in distinct ways and the average captures neither behavior.  But a recent paper argues that there is quite a lot you can deduce about the physical limits to cell behavior by knowing the average behavior of the proteins that make up the cell (Dill, Ghosh and Schmit 2011.  Physical limits of cells and proteomes.  PNAS doi/10/1073/pnas.1114477108).  Actually the average alone is not enough: you need to know the distribution around the average as well.

The argument goes like this.  Because the mass of a cell is (on average, and excluding water) about 50% protein, the physical properties of the mixture of proteins that make up the proteome are likely to be important in dictating the physical properties of the cell itself.  You might think this is a rather unhelpful idea: if you need to measure the properties of individual proteins one by one and average them all together to determine the overall behavior of the proteome, then it may be easier to measure the physical properties of the cell directly.  But it turns out that many physical properties of proteins depend strongly on their length.  For example, the free energy of folding of a protein is directly correlated to the number of amino acids it’s made up of (let us, creatively, call this number N).  While the details of the structure of the protein — secondary structure, the number of hydrophobic amino acids, the number of salt bridges, etc. — may be important for individual proteins, on average these details appear to have only a minor effect.  This means that you can, in principle at least, figure out quite a lot about how a cell’s proteome responds to heat by simply knowing the relationship between N and folding free energy, and the average and distribution of N.  Which, in principle, you can get from genomic information.  Similarly, if you assume that proteins are in general globular, then the overall size of a protein depends fairly straightforwardly on N.  That means that the rate of diffusion of a protein also depends on N.  And if you know the distribution of N for a cell’s proteome, and the size of the cell, you also know something about the density of the intracellular environment.

So Dill et al. are suggesting, among other things, that you should be able to use sequence databases to predict the response of different cells to heat shock.  They go further than simply suggesting that it should be possible: they set out to do it.  First, they needed to figure out the relationship between N and the free energy of folding, ΔG.  Since the free energy of folding of a given protein must be dependent on temperature, T, they use T as a variable as well.  They use literature measurements of ΔG for 116 proteins to create two different approximations for the ΔG/N/T relationship, one for proteins from mesophiles (those of us who like to live at moderate temperatures) and the other for proteins from thermophiles (those who like to think they’re hot, and live at 45ºC or above).

Having done this, all we need to know is N to be able to determine ΔG for any given temperature.  Using the mean and variance of protein chain lengths in the organism’s proteome, predicted from genome sequence information, you can get an approximation for this too.  By putting the two equations together (mesophile ΔG/N equation with N distributions from mesophiles, and thermophile ΔG/N equation with N distributions from thermophiles, naturally), Dill et al. can then produce an estimate for the distribution of stability of proteins in a given proteome.

This is already interesting because the thermophile protein stability equation is different from the mesophile equation — so ΔG depends not only on N but also on the class of organism.  And Dill et al. note that it isn’t entirely clear where the difference comes from.  Nevertheless, within each class of organisms there seems to be a reasonable linear relationship between ΔG and N.  So let’s just assume that all mesophile proteins behave the same way as each other, and take a look at a plot of the number of proteins versus stability in the genome of the biologist’s favorite organism, E. coli, at 37°C.  It has a pronounced skew and looks like this:

Figure 2 from Dill et al.

What this shows is that although the average protein is predicted to be fairly stable at 37°C (with a free energy of folding of about 6.8 kcal/mol), there are a few hundred proteins that are predicted to be only marginally stable (free energy of folding < 3 kcal/mol). So for E. coli, even a small change in temperature — say 4°C — would be predicted to destabilize about 16% of the proteins in the proteome.  Which would be bad; misfolded proteins are a problem, as we’ve discussed before. But just how bad would it be?

« Read the rest of this entry »

Think you know metabolism? Think again.

April 19, 2011 § Leave a comment

We had a very nice seminar from Uwe Sauer last week, in celebration of which I thought I would write about one of his papers.  Uwe would like to understand how metabolism is controlled, and as a result has done a great deal of work to develop ways to measure metabolic flux. A recent paper (Haverkorn van Rijsewijk et al. 2011, Large-scale 13C-flux analysis reveals distinct transcriptional control of respiratory and fermentative metabolism in Escherichia coli Mol. Sys. Biol. 7 477 doi:10.1038/msb.2011.9) describes the application of some of these technologies to ask to what extent metabolism of galactose and glucose by E. coli is controlled by transcription.

« Read the rest of this entry »

… and the case for physics in biology

August 6, 2010 § 1 Comment

John Higgins pointed me to this superb discussion of the challenges for physicists posed by biological systems (Phillips, R., & Quake, S. (2006). The Biological Frontier of Physics Physics Today 59 38-43).  In this paper Rob Phillips and Stephen Quake offer — as a public service — three examples of big fascinating problems in biology that physicists could get their teeth into.

The first is the operation of molecular machines: “[t]hey are incredibly sophisticated, and they, not their manmade counterparts, represent the pinnacle of nanotechnology.”  The authors choose ATP synthase as an example of a machine to marvel at.  Run in the forwards direction — transforming the energy in a proton gradient into chemical energy — ATP synthase delivers approximately your body weight in ATP molecules per day. Run backwards, ATP synthase is a rotary motor, delivering 120 degrees of rotation for every ATP hydrolyzed;  the absolute thermodynamic efficiency of this reaction has been estimated as up to 90%.   That’s going to be hard to beat.

« Read the rest of this entry »

A tale of two circuits

July 26, 2010 § 2 Comments

This is a story about a fortunate coincidence.  In two papers published simultaneously last year, the Kirschner lab and the Alon lab each noticed that the signaling pathway they were studying appeared to have peculiar responses.  In both cases, the amount of output — or at least, what had previously been assumed to be the output — triggered by a given amount of signal was highly variable.  Something was clearly wrong with our assumptions.

The Kirschner lab’s pathway was the Wnt pathway, an extremely important pathway in both development and cancer.  The output of this pathway is a change in β-catenin levels.  But both modeling and experiment showed that β-catenin levels varied wildly.  There was little relationship between the amount of Wnt stimulation and the resulting absolute level of β-catenin.  Instead, the measurement that looked as if it behaved more reasonably was the ratio between the resting level of β-catenin and its level after stimulation, or the “fold change” after stimulation.  (Goentoro L & Kirschner MW 2009 Evidence that fold-change, and not absolute level, of beta-catenin dictates Wnt signaling. Mol Cell. 36 872-84. PMID: 20005849).

Now, this is very odd.  We generally don’t think of cells as being able to remember the past (though the Silver lab is working on changing this).  So if the output of an important pathway is a ratio between the current level of a protein and an earlier level, we have two problems: how does the cell create this ratio? And once the ratio has been created, how does the cell read it?

« Read the rest of this entry »

Biology by the numbers

July 20, 2010 § 2 Comments

Ron Milo and colleagues recently published a “Snapshot” article in Cell about key numbers in biology (Moran U, Phillips R, Milo R. 2010 SnapShot: key numbers in biology. Cell 141 1262-1262).  This is a handy sampling of the most essential information in the BioNumbers database, which attempts to gather together credible, concrete numbers for biological properties in one easily searchable place. The overall goal is to enable “back of the envelope” calculations to be made in biology.  This particular sampling includes key items such as cell volumes for E. coli, yeast and mammalian cells, measurements of size for the average protein molecule, cell membrane and nucleus, and rates of replication, translation, diffusion and degradation.

One back of the envelope calculation Ron and colleagues have already done using these numbers is: how long should it take E. coli to replicate its genome?  The genome size is roughly 5 million bp and the replication rate is in the range of 200–1000 bp/s. So with two replisomes, it should take at least 2500 seconds (42 minutes) to replicate the genome.  But E. coli can double faster than that — under ideal conditions it can divide once every 20 min. How is this possible?  It turns out that when E. coli is very happy and growing fast, it doesn’t wait for the first DNA replication cycle to be completed before starting the next one.  So there are actually more than 2 replisomes active per cell — around 4 — and this is what explains the apparently too-fast growth rate.

Ron would like to put together a larger set of “canonical” numbers in biology for use in both research and teaching.  What are the most important numbers in your field?  Can you find them in BioNumbers?  (If not, can you please deposit them?)  Are they broadly useful, and should they be included in Ron’s handy handbook?  You can answer in the comments, or e-mail Ron directly.

Where Am I?

You are currently browsing the Quantitation category at It Takes 30.