Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods
Journal article, 2013

Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes. As a consequence of this, we suggest to use a consensus scoring approach, based on multiple GSA runs. In combination with the directionality classes, this constitutes a more thorough basis for an enriched biological interpretation.

Author

Leif Väremo

Chalmers, Chemical and Biological Engineering, Life Sciences

Jens B Nielsen

Chalmers, Chemical and Biological Engineering, Life Sciences

Intawat Nookaew

Chalmers, Chemical and Biological Engineering, Life Sciences

Nucleic Acids Research

0305-1048 (ISSN) 1362-4962 (eISSN)

Vol. 41 8 4378-4391

Areas of Advance

Information and Communication Technology

Life Science Engineering (2010-2018)

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Bioinformatics and Systems Biology

DOI

10.1093/nar/gkt111

More information

Created

10/8/2017