Machine Learning Applied to Predicting Microorganism Growth Temperatures and Enzyme Catalytic Optima
Journal article, 2019

Enzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for thermophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea, and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for reuse. In a subsequent step we use OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model-for prediction of enzyme catalytic temperature optima (T-opt). The resulting model generates enzyme T-opt estimates that are far superior to using OGT alone. Finally, we predict T-opt for 6.5 million enzymes, covering 4447 enzyme classes, and make the resulting data set available to researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.

thermostable enzymes

optimal growth temperature

sequence-based prediction

enzyme temperature optima

machine learning

Author

Gang Li

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Kersten S. Rabe

Karlsruhe Institute of Technology (KIT)

Jens B Nielsen

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

Martin Engqvist

Chalmers, Biology and Biological Engineering, Systems and Synthetic Biology

ACS Synthetic Biology

2161-5063 (eISSN)

Vol. 8 6 1411-1420

Subject Categories

Biochemistry and Molecular Biology

Bioinformatics (Computational Biology)

Biocatalysis and Enzyme Technology

DOI

10.1021/acssynbio.9b00099

PubMed

31117361

More information

Latest update

9/2/2019 8