Normalization and Differential Gene Expression Analysis of Microarray Data

MAGNUS ÅSTRAND

Normalization and Differential Gene Expression Analysis of Microarray Data
Doktorsavhandling, 2008

DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of mRNA-sequences. Analysis of microarray data involves many different steps such as image analysis, background correction, and normalization, but also more classical statistical analysis such as testing for significant differences between groups of arrays. The work presented in this thesis is focused on Affymetrix GeneChip arrays and deals with normalization and the problem of finding differentially expressed genes. Normalization of microarray data is essential to allow between-array comparisons. A procedure called Contrast Normalization is proposed and compared with existing methods together with two additional presented methods, Cyclic-Loess and Quantile Normalization. All three presented methods improve on the performance of the existing methods with a slight edge for Quantile Normalization. The quality of microarray data often varies between arrays. A model called WAME has been proposed, using a global covariance matrix to account for differing variances and array-to-array correlations, and thus WAME defines a weighted analysis for finding differentially expressed genes. This thesis presents two new methods for estimating the covariance matrix. Both methods show superior computer run-time over the existing method. Moreover, the second proposed method greatly reduces the bias of the existing method when used on simulated data with regulated genes, although to a less degree for real data with many regulated genes. Microarray data frequently shows a dependency between variability and intensity level which is ignored by the majority of moderated t-tests. The WAME model is extended to incorporate this dependency, and two locally moderated t-tests are proposed, Probe level Locally moderated Weighted median-t (PLW), and Locally Moderated Weighted-t (LMW). When compared with 12 existing methods on 5 spike-in data sets, the PLW method produces the most accurate ranking of regulated genes in 4 out of the 5 data sets, whereas LMW consistently performs better than all (globally) moderated t-tests.

Euler-salen, Matematiska vetenskaper, Chalmers Tvärgata 3, Chalmers

Opponent: Prof. Niels Richard Hansen, Köpenhamns Universitet, Danmark

Författare

MAGNUS ÅSTRAND

Chalmers, Matematiska vetenskaper

Göteborgs universitet

Forskning Andra publikationer

Ämneskategorier (SSIF 2011)

Sannolikhetsteori och statistik

ISBN

978-91-7385-043-8

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2724

Euler-salen, Matematiska vetenskaper, Chalmers Tvärgata 3, Chalmers

Opponent: Prof. Niels Richard Hansen, Köpenhamns Universitet, Danmark

Mer information

Skapat

2017-10-07

Normalization and Differential Gene Expression Analysis of Microarray Data Doktorsavhandling, 2008