How to read a genome
February 17, 2011 § Leave a comment
What makes you the unique human being you are? Partly it’s nurture — what your mother ate while she was pregnant with you, whether she smoked, how much you exercise, which drugs you take — and partly it’s nature. The part that’s nature is sometimes clearcut — if your biological father and mother both had the O negative blood type, you do too — and sometimes not. If your mother is tall and your father short, I can’t make any kind of confident prediction about how tall you are, even leaving aside the effects of nutrition. Height is a “quantitative trait”, a quality that is largely inherited but not controlled by a single gene. Unlike Mendel’s peas, which showed a digital (or qualitative) phenotype — either wrinkled or smooth — quantitative traits are analog, showing a smoothly variable degree of (say) wrinkled-ness. Such characteristics — like skin color or risk of adult-onset diabetes — are presumptively controlled by polymorphisms in multiple genes or loci, each making a small or medium-sized contribution to the eventual outcome. Genome-wide association studies (GWAS) aim to identify these loci and determine how much of a contribution they make to the trait; there’s currently a furious debate in the field over the issue of “missing heritability” (the gap between the expected genetic contribution to a trait and the sum of the genetic contributions identified by GWAS) and the question of how many of the polymorphisms contributing to disease are likely to be common (possible to identify via GWAS) and how many are likely to be rare.
But let’s leave missing heritability aside and look at a different GWAS-related question. Suppose you have found a locus that’s clearly associated with a given trait, for example the risk of developing a disease. You have a list of genetic polymorphisms at that locus that are positively or negatively associated with the disease in the population you studied. The next problem you face is that not all of these associations are going to be real: some polymorphisms will be causal, changing either the sequence or the expression level of the protein or RNA that mediates the increase in disease risk, and others will only appear to be associated with the phenotype because they’re correlated with the causal ones in the population you happened to study. If we want to be able to read an individual genome and determine whether the person who owns the genome has an elevated risk of a disease, we need to know which polymorphisms are really driving the behavior of the trait. And it would be nice if there were only a few important ones, with the majority of the list showing up because of correlations.
Angela DePace pointed out an interesting recent paper that takes on this challenge for a quantitative trait in Drosophila, the sex-specific color patterns on the abdomens of adult females (Bickel et al. 2011 Composite effects of polymorphisms near multiple regulatory elements create a major-effect QTL. PLoS Genetics, 7, e1001275). This particular trait is known to have a strong association with the bric a brac locus, so the authors took 96 D. melanogaster lines that vary in color pattern, sequenced the bric a brac locus for each line, and set out to figure how the variation at the locus led to the variation in color pattern.
Better genomics through chemistry
January 28, 2011 § Leave a comment
There’s been a little flurry of papers from UCSF recently about using chemical and environmental perturbations to ask when and why you need the function of a particular gene. I originally thought I might try to write about all of them at once, but no — there’s more here than I can do justice to in a single post. So I picked Nichols et al. 2011 (Phenotypic landscape of a bacterial cell, Cell PMID: 21185072), partly because it most clearly describes the approach, but mostly because a graduate student (Rupinder Sayal from MSU) sent it to me and suggested that I write about it. (Thanks, Rupinder!)
Biologists owes quite a debt to chemistry. It’s probable that one of your favorite proteins was discovered because it was the target of a drug. Target of rapamycin (Tor) is one example where the discovery process is immortalized right there in the name of the protein, but there are lots of others. Tubulin was discovered as the target of colchicine. I could go on, or you could ask Tim Mitchison, who can be eloquent on this subject if roused. Discoveries such as these opened up whole new areas of biology. Now that we have genomics tools, though, can drugs tell us even more? (You know I wouldn’t be asking this question if the answer were not at least partly yes. Indulge me.)
November 9, 2010 § 3 Comments
Untreated, HIV is normally a death sentence. But not quite always. A small number of people infected with HIV can survive for decades without symptoms. They’re called “elite controllers”, and — although the fact that they’re healthy makes them hard to identify with certainty — they’re thought to comprise less than 1% of the infected population.
Elite controllers, as the name suggests, control the replication of HIV much better than a normal infected person. Although they’re definitely infected, they have very low (to undetectable) amounts of virus circulating in their bloodstream. They are therefore much less likely to pass on the infection, and they maintain perfectly normal levels of CD4 cells. For these few lucky individuals, HIV may be merely an inconvenience.
What makes them special? A genome-wide association study, performed as a result of an impressive collaborative effort (the list of authors is longer than the paper), has come up with a simple and satisfying answer: the genes most clearly associated with being an elite controller are essentially all variants of MHC class I, and identifying the subtypes of MHC class I that are over-represented in the elite controller population makes it possible to pinpoint a handful of amino acids in the peptide-binding groove as important for protection.
Sur nous, le deluge
August 12, 2010 § Leave a comment
A recent piece of correspondence in Genome Biology (Parkhill, J, Birney, E and Kersey, P. 2010. Genomic information infrastructure after the deluge Genome Biology 11 402) discusses the fact that the ability of the scientific community to maintain well-curated, up-to-date reference genomes is failing in the face of the flood of new sequence information. The authors point out that, while our ability to obtain sequence data has been rapidly increasing — and the ways we use sequence data have been proliferating — there has been no corresponding change in the community’s ability to store, organize and interpret these data. As a result, many genomes were annotated once when they were first submitted to the public databases and have never been updated; the group that did the sequence moved on to another challenge, and there has been no organized attempt to curate the information from experiments enabled or informed by the genome sequence.
It’s not a pretty picture, and there is every reason to think that (without intervention) it will only get worse. Parkhill et al. point out that a major problem of the current model is that it is hard for funding agencies to figure out how to interact with the patchwork of existing resources: it’s hard to determine whether a resource would emerge anyway, without the help of a particular agency or grant; it’s hard to know whether a specific resource offers good value for money; and it’s unclear how long-term funding can be accomplished. The problem is exacerbated by the fact that these resources are, and should be, international, and they are therefore exposed to the shifting winds of enthusiasm for science funding from many directions.
If you can’t grow it, sequence it
August 2, 2010 § Leave a comment
Bacteria live almost everywhere, and use a staggering variety of strategies to get the energy they need to grow. In the process, they make and recycle all kinds of globally important materials; and we often don’t understand how, biochemically, they do this. One reason — apart from the sheer overwhelming number of different types of bacteria — is that many bacterial species are hard to culture in the laboratory. Estimates of the proportion of bacteria that are “unculturable” (or, not cultured yet) range as high as 99%, based on sequencing of 16S rRNAs. If the microorganism you want to study happens to be among the unlucky 99%, what are you supposed to do? These days, you have a new option: sequence its genome.
In a paper published this month in PNAS (Lücker et al. 2010 A Nitrospira metagenome illuminates the physiology and evolution of globally important nitrite-oxidizing bacteria. Proc Natl Acad Sci U S A. 107, 13479-13484 PMID: 20624973), Lücker et al. do just that. Frustrated by their inability to grow a nitrite-oxidizing bacterium — one that grows happily in sewage treatment facilities, what’s more, and therefore has no right to be fussy — they made an enriched preparation of it from sludge, and sequenced it.