Utilizing Rare and X-Linked Variants for Inference of Population Size History and Association Studies of Complex Diseases

TitleUtilizing Rare and X-Linked Variants for Inference of Population Size History and Association Studies of Complex Diseases
Publication TypeThesis
Year of Publication2017
AuthorsGao F
AdvisorKeinan A
Date Published01/2017
UniversityCornell University
KeywordsBioinformatics, Chromosome X, Coalescent Theory, Demographic Modeling, Medical Genetics, Rare Variants

The fast development of sequencing technologies has enabled rapid and large-scale sequencing of human genomes. This leads to the availability of an increasing number of high-quality whole-genome and exome sequencing datasets, and provides excellent opportunities for human genomic research. One common observation from these genetic datasets is an extreme excess of rare variants. One important way to utilize the information encoded in these rare variants is exploring their contribution to human complex diseases and traits. In Chapter 2, I describe our pharmacological genetic research with the goal of identifying the effects of rare genetic variants on patients' response to lipid-lowering therapies using a sequencing dataset of about 2,400 individuals. I discovered three significant associations, showing that rare variants lower the efficacy of drugs for different lipid levels. A second potential utilization of the observation of rare variants in human genomes is studying the historical scenarios that gave rise to them, specifically recent human population growth. Although many previous studies of inferring such growth from the site frequency spectrum have shown that human populations have undergone a recent epoch of fast growth in effective population size, one common limitation is that they assumed the speed of growth to take the form of exponential growth, and the ensuing models leave an excess amount of extremely rare variants. A more recent study introduced a generalized model that allows the growth speed to be faster or slower than exponential. However, only simulation software was available for generalized models. In Chapter 3, I provide analytical expressions to accurately and efficiently evaluate the site frequency spectrum and other summary statistics under generalized models, as well as publicly available software that implements these expressions. Applying my inference framework to the data from a large-scale exome sequencing dataset, I found evidence that the recent growth of Europeans is 12% faster than exponential. Beyond autosomal variants, genetic variants on chromosome X also play a vital role in human complex diseases and quantitative traits. Compared with autosomes, chromosome X shows many unique properties, with a most obvious and important one being that males only have one copy of chromosome X. However, a vast majority of genome-wide association studies have either ignored chromosome X, or analyzed chromosome X using the same approaches for autosomal variants, potentially leading many X-linked associations to remain unrevealed. In Chapter 4, I describe XWAS, a software toolset tailored for the association analysis of chromosome X. It implements X-specific quality-control procedures as well as X-adapted single-marker and gene-based tests. I further demonstrate the usefulness of XWAS by its application to the analysis of multiple autoimmune datasets and the discovery of several new X-linked genetic associations. Some of the associations exhibit significant discrepancies in males and females, demonstrating the importance of improving association methods to account for sex bias in chromosome X.