Statistical Methods for Genome Wide Association Studies

Malin Östensson

Statistical Methods for Genome Wide Association Studies
Doctoral thesis, 2012

This thesis focus on various statistical methods for analyzing Genome Wide Association data. The thesis include four papers, three of them considers the analysis of complex traits, and the last one a method for analyzing mendelian traits. Although GWAS have identified many associated regions in the genome for many com- plex diseases, there is still much of the genetic heritability that remains unexplained. The power of detecting new genetic risk variants can be improved by considering several genes in the same model. A genetic variant in the HLA region on chromosome 6 is necessary but not sufficient to develop Celiac Disease. In the first two papers we utilize this information to discover additional genetic variants. In Paper I this is done by a method which use the ’Cochran Armitage trend test’, to find a trend in allele frequencies. Simulations are used to evaluate the power of this test compared with the commonly used Pearson 1 df chisquare test and the test is then applied to a previously published Celiac Disease case-control material. In paper II the HLA information is utilized by a stratified TDT, conditioning on the HLA variants. In addition, an imputation-based version of the TDT is presented, as well as a likelihood ratio test searching for two-locus interactions by comparing the heterogeneity and epistasis models. Here the candidates for interaction analysis are chosen by a two-step approach, combining the results from the TDT and prior information from previous studies. In contrast to the approach used in paper II for identifying interactions between genes, in paper 3 we instead consider the method of performing a full Genome Wide Interaction Analysis. By examining how commonly we will find interactions without marginal effects in a GWIA we discuss what conclusions can be drawn from such findings. In the final paper we develop a program locating a region containing a causal gene for rare monogenic traits. This program can be used in large pedigrees with multiple affected cases, and discerns the causal region by coloring them according to how common they are in the population.

haplotype sharing

allele sharing

Genotype imputation

gene-gene interactions

Single Nucleotide Polymorphism

Celiac Disease

Genome Wide Association Studies

Sal Pascal, Institutionen för Matematiska vetenskaper vid Chalmers tekniska högskola och Göteborgs universitet

Opponent: Håkon Gjessing, Norwegian Institute of Public Health

Author

Malin Östensson

Chalmers, Mathematical Sciences, Mathematical Statistics

University of Gothenburg

Other publications Research

Hur mycket ärver vi från våra föräldrar? Många av våra egenskaper verkar återkomma bland våra släktingar. Vi säger ofta att ’det är genetiskt’. Av särskilt intresse är hur sjukdomar överförs från föräldrar till barn och vilka genetiska varianter som orsakar eller ger en ökad risk för dessa sjukdomar. För de enklaste genetiska sjukdomsmodellerna finns bara en variant som kan göra en person sjuk – om du saknar denna variant så blir du inte sjuk. För sådana sjukdomar kan den orsakande genen identifieras genom att jämföra kromosom-segment där de sjukas arvsmassa är lika. Många av de vanligaste sjukdomarna hos människan verkar dock påverkas av flera olika faktorer – både genetiska och miljöfaktorer. Dessa faktorer kan samverka på ett sådant sätt att flera av riskvarianterna behövs för att bli sjuk. Dessutom kan olika riskfaktorer ge upphov till samma sjukdom. För dessa komplexa sjukdomar behövs mera avancerade statistiska metoder för att identifiera riskgenerna, än de metoder som används för enklare sjukdomsmodeller. Genom att använda oss av statistiska metoder som tillåter samverkan mellan flera riskgener kan vi identifiera de riskfaktorer som är svåra att upptäcka med enklare metoder. Vi undersöker även hur resultaten av dessa metoder kan tolkas när en stor mängd genetiska varianter undersöks. För Celiaki (gluten-intolerans) finns en redan identifierad nödvändig riskgen. Med statistiska metoder utnyttjar vi denna risk-variant för att lättare identifiera ytterligare riskgener.

How much do we inherit from our parents? Many of our traits are shared with our relatives. We often say that ’it is genetic’. We are particularly interested in how diseases are transmitted from parents to offspring, and which genetic variants that cause or increase the risk of these disorders. For the most simple genetic disease models there is one single variant causing the disease – if you lack this variant you won’t be affected. For such diseases the causal gene can be identified by comparing chromosome segments where the cases’ genomes are similar. Many of the most common human diseases appear to be affected by several different factors, both genetic and environmental. These factors can interact in a way that several risk variants are needed to get ill. Also, different risk factors may cause the same disorder. For these complex diseases more advanced statistical methods are needed in order to identify the risk genes, compared to the methods that are used for simpler disease models. By the use of statistical methods that allow for interactions between several risk genes we can identify risk factors which are hard to discover with simpler methods. We also investigate how the results from such methods can be interpreted when a large number of genetic variants are examined. For Celiac Disease (gluten intolerance) there is one necessary risk variant already identified. With statistical methoder utilize the information of this gene to improve detection of additional risk genes.

Subject Categories (SSIF 2011)

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

Probability Theory and Statistics

Genetics

Areas of Advance

Life Science Engineering (2010-2018)

ISBN

978-91-73-85742-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie