HirBin: High-resolution identification of differentially abundant functions in metagenomes
Journal article, 2017

Background: Gene-centric analysis of metagenomics data provides information about the biochemical functions present in a microbiome under a certain condition. The ability to identify significant differences in functions between metagenomes is dependent on accurate classification and quantification of the sequence reads (binning). However, biological effects acting on specific functions may be overlooked if the classes are too general. Methods: Here we introduce High-Resolution Binning (HirBin), a new method for gene-centric analysis of metagenomes. HirBin combines supervised annotation with unsupervised clustering to bin sequence reads at a higher resolution. The supervised annotation is performed by matching sequence fragments to genes using well-established protein domains, such as TIGRFAM, PFAM or COGs, followed by unsupervised clustering where each functional domain is further divided into sub-bins based on sequence similarity. Finally, differential abundance of the sub-bins is statistically assessed. Results:We show that HirBin is able to identify biological effects that are only present at more specific functional levels. Furthermore we show that changes affecting more specific functional levels are often diluted at the more general level and therefore overlooked when analyzed using standard binning approaches. Conclusions: HirBin improves the resolution of the gene-centric analysis of metagenomes and facilitates the biological interpretation of the results. HirBin is implemented as a Python package and is freely available for download at http://bioinformatics.math.chalmers.se/hirbin.

Binning

Differential abundance

Metagenomics

TIGRFAM

Next-generation sequencing

Statistical analysis

Functional annotation

Author

Tobias Österlund

University of Gothenburg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

Viktor Jonsson

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

University of Gothenburg

Erik Kristiansson

University of Gothenburg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

BMC Genomics

1471-2164 (ISSN)

Vol. 18 1 316

Subject Categories

Bioinformatics (Computational Biology)

Bioinformatics and Systems Biology

Genetics

DOI

10.1186/s12864-017-3686-6

More information

Created

10/8/2017