Weighted Analysis of Microarray Experiments
Doctoral thesis, 2007
DNA microarrays are strikingly efficient tools for analysing gene expression for large sets of genes simultaneously. The aim is often to identify genes which are differentially expressed between some studied conditions, thereby gaining insight into which cellular mechanisms are differently active between the conditions. In the measurement process, several steps exist that risk going partly or entirely wrong and quality control is therefore crucial.
In Paper I-III, a novel method is developed which integrates quality control quantitatively into the analysis of microarray experiments. The noise structure for each gene is modelled by (i) a global covariance structure matrix catching decreased quality by array-wise variances and catching shared sources of variation by correlations, and (ii) gene-wise variance scales having a prior distribution with parameters estimated from the data of all genes in an empirical Bayes manner. The variances and correlations are entirely estimated from the data. In the estimates and tests for differential expression, arrays with lower precision or arrays sharing sources of variation are downweighted. Thus, the sharp decision of entirely excluding arrays is avoided. The method is called Weighted Analysis of Microarray Experiments (WAME).
Current methods for microarray analysis generally disregard the quality variations. Simulations based on real data show that this often results in severely invalid p-values. Trusting such p-values therefore risks resulting in false biological conclusions. WAME gives increased power and valid p-values when few genes are differentially expressed and conservative p-values otherwise. Similar results are seen on simulations according to the model.
In Paper IV, WAME is used to identify genes which are differentially expressed between small and large human fat cells. WAME here successfully downweights one array that was suspected of decreased quality on biological grounds.
The WAME method is freely available as a add-on package for the R language.
quality assurance (QA)
quality control (QC)
weighted moderated statistic
generalised linear model
sal Euler, Chalmers Tvärgata 3, Matematiska vetenskaper, Chalmers tekniska högskola
Opponent: doktor Anne-Mette Hein, Molecular Diagnostic Laboratory, Aarhus University Hospital, Denmark