Normalization and Differential Gene Expression Analysis of Microarray Data
DNA microarray technologies have the capability of simultaneously measuring the abundance of thousands of mRNA-sequences. Analysis of microarray data involves many different steps such as image analysis, background correction, and normalization, but also more classical statistical analysis such as testing for significant differences between groups of arrays.
The work presented in this thesis is focused on Affymetrix GeneChip arrays and deals with normalization and the problem of finding differentially expressed genes. Normalization of microarray data is essential to allow between-array comparisons. A procedure called Contrast Normalization is proposed and compared with existing methods together with two additional presented methods, Cyclic-Loess and Quantile Normalization. All three presented methods improve on the performance of the existing methods with a slight edge for Quantile Normalization.
The quality of microarray data often varies between arrays. A model called WAME has been proposed, using a global covariance matrix to account for differing variances and array-to-array correlations, and thus WAME defines a weighted analysis for finding differentially expressed genes. This thesis presents two new methods for estimating the covariance matrix. Both methods show superior computer run-time over the existing method. Moreover, the second proposed method greatly reduces the bias of the existing method when used on simulated data with regulated genes,
although to a less degree for real data with many regulated genes.
Microarray data frequently shows a dependency between variability and intensity level which is ignored by the majority of moderated t-tests. The WAME model is extended to incorporate this dependency, and two locally moderated t-tests are proposed, Probe level Locally moderated Weighted median-t (PLW), and Locally Moderated Weighted-t (LMW).
When compared with 12 existing methods on 5 spike-in data sets, the PLW method produces the most accurate ranking of regulated genes in 4 out of the 5 data sets, whereas LMW consistently performs better than all (globally) moderated t-tests.
Euler-salen, Matematiska vetenskaper, Chalmers Tvärgata 3, Chalmers
Opponent: Prof. Niels Richard Hansen, Köpenhamns Universitet, Danmark