Predicting signatures of "synthetic associations" and "natural associations" from empirical patterns of human genetic variation

TitlePredicting signatures of "synthetic associations" and "natural associations" from empirical patterns of human genetic variation
Publication TypeJournal Article
Year of Publication2012
AuthorsChang D, Keinan A
JournalPLoS Comput. Biol.
KeywordsComputer Simulation, Databases, Evolution, Gene Frequency, Genetic, Genetic Variation, Genome-Wide Association Study, Genomics, Humans, Linkage Disequilibrium, Models, Molecular, Polymorphism, Single Nucleotide, {HapMap} Project

Genome-wide association studies (GWAS) have in recent years discovered thousands of associated markers for hundreds of phenotypes. However, associated loci often only explain a relatively small fraction of heritability and the link between association and causality has yet to be uncovered for most loci. Rare causal variants have been suggested as one scenario that may partially explain these shortcomings. Specifically, Dickson et al. recently reported simulations of rare causal variants that lead to association signals of common, tag single nucleotide polymorphisms, dubbed “synthetic associations”. However, an open question is what practical implications synthetic associations have for GWAS. Here, we explore the signatures exhibited by such “synthetic associations” and their implications based on patterns of genetic variation observed in human populations, thus accounting for human evolutionary history –a force disregarded in previous simulation studies. This is made possible by human population genetic data from HapMap 3 consisting of both resequencing and array-based genotyping data for the same set of individuals from multiple populations. We report that synthetic associations tend to be further away from the underlying risk alleles compared to “natural associations” (i.e. associations due to underlying common causal variants), but to a much lesser extent than previously predicted, with both the age and the effect size of the risk allele playing a part in this phenomenon. We find that while a synthetic association has a lower probability of capturing causal variants within its linkage disequilibrium block, sequencing around the associated variant need not extend substantially to have a high probability of capturing at least one causal variant. We also show that the minor allele frequency of synthetic associations is lower than of natural associations for most, but not all, loci that we explored. Finally, we find the variance in associated allele frequency to be a potential indicator of synthetic associations.