Metagenomic Data Utilization and Analysis (MEDUSA) and Construction of a Global Gut Microbial Gene Catalogue
Journal article, 2014
Metagenomic sequencing has contributed important new knowledge about the microbes that live in a symbiotic relationship with humans. With modern sequencing technology it is possible to generate large numbers of sequencing reads from a metagenome but analysis of the data is challenging. Here we present the bioinformatics pipeline MEDUSA that facilitates analysis of metagenomic reads at the gene and taxonomic level. We also constructed a global human gut microbial gene catalogue by combining data from 4 studies spanning 3 continents. Using MEDUSA we mapped 782 gut metagenomes to the global gene catalogue and a catalogue of sequenced microbial species. Hereby we find that all studies share about half a million genes and that on average 300 000 genes are shared by half the studied subjects. The gene richness is higher in the European studies compared to Chinese and American and this is also reflected in the species richness. Even though it is possible to identify common species and a core set of genes, we find that there are large variations in abundance of species and genes.