Statistical Methods for Genome Wide Association Studies
Doctoral thesis, 2012
This thesis focus on various statistical methods for analyzing Genome Wide Association data. The thesis include four papers, three of them considers the analysis of complex traits, and the last one a method for analyzing mendelian traits.
Although GWAS have identified many associated regions in the genome for many com- plex diseases, there is still much of the genetic heritability that remains unexplained. The power of detecting new genetic risk variants can be improved by considering several genes in the same model.
A genetic variant in the HLA region on chromosome 6 is necessary but not sufficient to develop Celiac Disease. In the first two papers we utilize this information to discover additional genetic variants. In Paper I this is done by a method which use the ’Cochran Armitage trend test’, to find a trend in allele frequencies. Simulations are used to evaluate the power of this test compared with the commonly used Pearson 1 df chisquare test and the test is then applied to a previously published Celiac Disease case-control material.
In paper II the HLA information is utilized by a stratified TDT, conditioning on the HLA variants. In addition, an imputation-based version of the TDT is presented, as well as a likelihood ratio test searching for two-locus interactions by comparing the heterogeneity and epistasis models. Here the candidates for interaction analysis are chosen by a two-step approach, combining the results from the TDT and prior information from previous studies.
In contrast to the approach used in paper II for identifying interactions between genes, in paper 3 we instead consider the method of performing a full Genome Wide Interaction Analysis. By examining how commonly we will find interactions without marginal effects in a GWIA we discuss what conclusions can be drawn from such findings.
In the final paper we develop a program locating a region containing a causal gene for rare monogenic traits. This program can be used in large pedigrees with multiple affected cases, and discerns the causal region by coloring them according to how common they are in the population.
haplotype sharing
allele sharing
Genotype imputation
gene-gene interactions
Single Nucleotide Polymorphism
Celiac Disease
Genome Wide Association Studies