Intelligent data acquisition for drug design through combinatorial library design
Licentiate thesis, 2023

A problem that occurs in machine learning methods for drug discovery is a
need for standardized data. Methods and interest exist for producing new data
but due to material and budget constraints it is desirable that each iteration of
producing data is as efficient as possible. In this thesis, we present two papers
methods detailing different problems for selecting data to produce. We invest-
igate Active Learning for models that use the margin in model decisiveness to
measure the model uncertainty to guide data acquisition. We demonstrate that
the models perform better with Active Learning than with random acquisition
of data independent of machine learning model and starting knowledge. We
also study the multi-objective optimization problem of combinatorial library
design. Here we present a framework that could process the output of gener-
ative models for molecular design and give an optimized library design. The
results show that the framework successfully optimizes a library based on
molecule availability, for which the framework also attempts to identify using
retrosynthesis prediction. We conclude that the next step in intelligent data
acquisition is to combine the two methods and create a library design model
that use the information of previous libraries to guide subsequent designs.

determinantal point processes

generative models

machine learning

drug discovery

active learning

Cheminformatics

EE, EDIT-building, Rännvägen 6
Opponent: Prof. Andrea Volkamer, Saarland University, Germany

Author

Simon Johansson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction

Molecular Informatics,;Vol. In Press(2022)

Journal article

Johansson, S.V., Chehreghani, M.H., Engkvist, O., Schliep, A., de novo generated combinatorial library design

Areas of Advance

Information and Communication Technology

Health Engineering

Subject Categories

Design

Other Chemistry Topics

Computer Science

Publisher

Chalmers

EE, EDIT-building, Rännvägen 6

Online

Opponent: Prof. Andrea Volkamer, Saarland University, Germany

More information

Latest update

5/25/2023