How to read a genome
February 17, 2011 § Leave a comment
What makes you the unique human being you are? Partly it’s nurture — what your mother ate while she was pregnant with you, whether she smoked, how much you exercise, which drugs you take — and partly it’s nature. The part that’s nature is sometimes clearcut — if your biological father and mother both had the O negative blood type, you do too — and sometimes not. If your mother is tall and your father short, I can’t make any kind of confident prediction about how tall you are, even leaving aside the effects of nutrition. Height is a “quantitative trait”, a quality that is largely inherited but not controlled by a single gene. Unlike Mendel’s peas, which showed a digital (or qualitative) phenotype — either wrinkled or smooth — quantitative traits are analog, showing a smoothly variable degree of (say) wrinkled-ness. Such characteristics — like skin color or risk of adult-onset diabetes — are presumptively controlled by polymorphisms in multiple genes or loci, each making a small or medium-sized contribution to the eventual outcome. Genome-wide association studies (GWAS) aim to identify these loci and determine how much of a contribution they make to the trait; there’s currently a furious debate in the field over the issue of “missing heritability” (the gap between the expected genetic contribution to a trait and the sum of the genetic contributions identified by GWAS) and the question of how many of the polymorphisms contributing to disease are likely to be common (possible to identify via GWAS) and how many are likely to be rare.
But let’s leave missing heritability aside and look at a different GWAS-related question. Suppose you have found a locus that’s clearly associated with a given trait, for example the risk of developing a disease. You have a list of genetic polymorphisms at that locus that are positively or negatively associated with the disease in the population you studied. The next problem you face is that not all of these associations are going to be real: some polymorphisms will be causal, changing either the sequence or the expression level of the protein or RNA that mediates the increase in disease risk, and others will only appear to be associated with the phenotype because they’re correlated with the causal ones in the population you happened to study. If we want to be able to read an individual genome and determine whether the person who owns the genome has an elevated risk of a disease, we need to know which polymorphisms are really driving the behavior of the trait. And it would be nice if there were only a few important ones, with the majority of the list showing up because of correlations.
Angela DePace pointed out an interesting recent paper that takes on this challenge for a quantitative trait in Drosophila, the sex-specific color patterns on the abdomens of adult females (Bickel et al. 2011 Composite effects of polymorphisms near multiple regulatory elements create a major-effect QTL. PLoS Genetics, 7, e1001275). This particular trait is known to have a strong association with the bric a brac locus, so the authors took 96 D. melanogaster lines that vary in color pattern, sequenced the bric a brac locus for each line, and set out to figure how the variation at the locus led to the variation in color pattern.