Computational and Statistical Considerations in the Analysis of Metagenomic Data
Book chapter, 2018

In shotgun metagenomics, microbial communities are studied by random DNA fragments sequenced directly from environmental and clinical samples. The resulting data is massive, potentially consisting of billions of sequence reads describing millions of microbial genes. The data interpretation is therefore nontrivial and dependent on dedicated computational and statistical methods. In this chapter we discuss the many challenges associated with the analysis of shotgun metagenomic data. First, we address computational issues related to the quantification of genes in metagenomes. We describe algorithms for efficient sequence comparisons, recommended practices for setting up data workflows and modern high-performance computer resources that can be used to perform the analysis. Next, we outline the statistical aspects, including removal of systematic errors and how to identify differences between microbial communities from different experimental conditions. We conclude by underlining the increasing importance of efficient and reliable computational and statistical solutions in the analysis of large metagenomic datasets.

Gene quantification

Normalization

Differentially abundant genes

Shotgun metagenomics

High-performance computing

High-dimensional data

Sequence mapping

Author

Fredrik Boulund

Karolinska University Hospital

Mariana Buongermino Pereira

University of Gothenburg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

Viktor Jonsson

University of Gothenburg

Chalmers, Mathematical Sciences

Erik Kristiansson

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

University of Gothenburg

Metagenomics: Perspectives, Methods, and Applications

81-102

Subject Categories

Other Computer and Information Science

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

DOI

10.1016/B978-0-08-102268-9.00004-5

More information

Latest update

1/8/2021 2