Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.

Viktor Jonsson; Tobias Österlund; Olle Nerman; Erik Kristiansson

doi:10.1089/cmb.2016.0180

Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.
Journal article, 2017

Metagenomics is the study of microorganisms in environmental and clinical samples using high-throughput sequencing of random fragments of their DNA. Since metagenomics does not require any prior culturing of isolates, entire microbial communities can be studied directly in their natural state. In metagenomics, the abundance of genes is quantified by sorting and counting the DNA fragments. The resulting count data are high-dimensional and affected by high levels of technical and biological noise that make the statistical analysis challenging. In this article, we introduce an hierarchical overdispersed Poisson model to explore the variability in metagenomic data. By analyzing three comprehensive data sets, we show that the gene-specific variability varies substantially between genes and is dependent on biological function. We also assess the power of identifying differentially abundant genes and show that incorrect assumptions about the gene-specific variability can lead to unacceptable high rates of false positives. Finally, we evaluate shrinkage approaches to improve the variance estimation and show that the prior choice significantly affects the statistical power. The results presented in this study further elucidate the complex variance structure of metagenomic data and provide suggestions for accurate and reliable identification of differentially abundant genes.

Author

Viktor Jonsson

University of Gothenburg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

Other publications Research

Tobias Österlund

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

University of Gothenburg

Other publications Research

Olle Nerman

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

University of Gothenburg

Other publications Research

Erik Kristiansson

University of Gothenburg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

Other publications Research

Journal of Computational Biology

1066-5277 (ISSN)

Vol. 24 4 311-326

Driving Forces

Sustainable development

Roots

Basic sciences

Subject Categories (SSIF 2011)

Microbiology

Bioinformatics and Systems Biology

Probability Theory and Statistics

Genetics

Areas of Advance

Life Science Engineering (2010-2018)

DOI

10.1089/cmb.2016.0180

Publication data connected to DOI

PubMed

27892712

More information

Latest update

3/21/2023

Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes. Journal article, 2017

Author

Viktor Jonsson

Tobias Österlund

Olle Nerman

Erik Kristiansson

Journal of Computational Biology

Driving Forces

Roots

Subject Categories (SSIF 2011)

Areas of Advance

DOI

PubMed

More information

Latest update

Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.
Journal article, 2017