#
Application of machine learning in systems biology
Doktorsavhandling, 2020

In such applications, regression models are frequently used, and their performance relies on many factors, including but not limited to feature engineering and quality of response values. Manually engineering sufficient relevant features is particularly challenging in biology due to the lack of knowledge in certain areas. With the increasing volume of big data, deep-transfer learning enables us to learn a statistical summary of the samples from a big dataset which can be used as input to train other ML models. In the present thesis, I applied this approach to first learn a deep representation of enzyme thermal adaptation and then use it for the development of regression models for predicting enzyme optimal and protein melting temperatures. It was demonstrated that the transfer learning-based regression models outperform the classical ones trained on rationally engineered features in both cases. On the other hand, noisy response values are very common in biological datasets due to the variation in experimental measurements and they fundamentally restrict the performance attainable with regression models. I thereby addressed this challenge by deriving a theoretical upper bound for the coefficient of determination (

*R*2) for regression models. This theoretical upper bound depends on the noise associated with the response variable and variance for a given dataset. It can thus be used to test whether the maximal performance has been reached on a particular dataset, or whether further model improvement is possible.

deep transfer learning

genome-scale modelling

systems biology

uncertainty

Machine learning

regression

## Författare

### Gang Li

Chalmers, Biologi och bioteknik, Systembiologi

### Machine Learning Applied to Predicting Microorganism Growth Temperatures and Enzyme Catalytic Optima

ACS Synthetic Biology,; Vol. 8(2019)p. 1411-1420

**Artikel i vetenskaplig tidskrift**

### The pan-genome of Saccharomyces cerevisiae

FEMS Yeast Research,; Vol. 19(2019)

**Artikel i vetenskaplig tidskrift**

### Li G, Hu Y, Wang H, Zelezniak A, Ji B, Zrimec J and Nielsen J. Bayesian genome scale modeling identifies thermal determinants of yeast metabolism

### Li G, Zrimec J, Ji B, Geng J, Larsbrink J, Zelezniak A, Nielsen J and Engqvist MKM. Performance of regression models as a function of experiment noise

### Li G, Zrimec J, Viknander S, Zelezniak A, Nielsen J and Engqvist MKM. Learning deep representations of enzyme thermal adaptation

In this thesis, I firstly explore and discuss the different application scenarios of machine learning in systems biology. Among other applications, I show that i) machine learning can be used to model the systems in which the relationships between the components and the system are too complex to be modelled with theory-based models; ii) machine learning can be used to improve the existing theory-based models. Secondly, machine learning approaches rely heavily on the quality and volume of data, which is limiting in most biological datasets. In regard to the quality of data, I evaluate the effect of noise in the development of regression models with both theoretical analysis and simulations. In regard to the volume of data, I showcase how deep-transfer learning can be applied to datasets with only a small number of training samples.

The results presented in this thesis show that machine learning is a powerful tool and its applications in systems biology are still on-going with many challenges to be solved. In the future, it will become one of the standard tools in the toolbox of every systems biologist.

### Predictive and Accelerated Metabolic Engineering Network (PAcMEN)

Europeiska kommissionen (EU), 2016-09-01 -- 2020-08-30.

### Ämneskategorier

Biologiska vetenskaper

### Fundament

Grundläggande vetenskaper

### Infrastruktur

C3SE (Chalmers Centre for Computational Science and Engineering)

### ISBN

978-91-7905-290-4

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4757

### Utgivare

Chalmers tekniska högskola

Opponent: Vassily Hatzimanikatis, EPFL, Switzerland