Statistical analysis of gene expression data
Doctoral thesis, 2007

Microarray technology has become one of the most important tools for genome-wide mRNA measurements. The technique has been successfully applied to many areas in modern biology including cancer research, identification of drug targets, and categorization of genes involved in the cell cycle. Nevertheless, the analysis of microarray data is difficult due to the vast dimensionality and the high levels of noise. The need for solid statistical methods is therefore strong. The main results are presented in six papers. The first three develop a statistical model for quality assessment and improved gene ranking called Weighted Analysis of Microarray Experiments (WAME). Here, the customary assumption of independent samples is shown to be invalid and individual variances for each array and correlations between pairs of arrays are introduced. Comparisons to other common methods suggest that the proposed model produces more accurate results. The first paper describes the model for simple experimental setups for two-channel arrays. This model is then generalized to more complex designs in paper two and to one-channel microarrays in paper three. Transcription factors govern gene expression in the cell by binding to short sequences called cis-regulatory elements. These sequences are located in the promoters, which are regions of DNA upstream of the genes. In paper four, we show that the lengths of these promoters are related to gene function. In particular, the promoters for stress responsive genes are in general longer than those of other genes. This is used in a novel method for identifying relevant cis-regulatory elements from a list of differentially expressed genes. Papers five and six present microarray based studies from molecular biology and environmental toxicology respectively. In paper five, microarrays are used to identify Saccharomyces cerevisiae genes with changed mRNA levels under arsenic stress. In paper six, biomarkers for estrogen exposure in fish are found using both an in-house microarray experiment and a meta-analysis of several public gene expression datasets.

gene expression

linear models

categorical data analysis

heavy metal stress

logistic regression

DNA microarrays

empirical Bayes

ecotoxicology

quality control

gene regulation

Euler, Matematiska vetenskaper
Opponent: Eivind Hovig

Author

Erik Kristiansson

University of Gothenburg

Chalmers, Mathematical Sciences, Mathematical Statistics

Subject Categories (SSIF 2011)

Cell Biology

Biochemistry and Molecular Biology

Probability Theory and Statistics

ISBN

978-91-7385-039-1

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2720

Euler, Matematiska vetenskaper

Opponent: Eivind Hovig

More information

Created

10/7/2017