Intelligent data acquisition for drug design through combinatorial library design
Licentiatavhandling, 2023

A problem that occurs in machine learning methods for drug discovery is a
need for standardized data. Methods and interest exist for producing new data
but due to material and budget constraints it is desirable that each iteration of
producing data is as efficient as possible. In this thesis, we present two papers
methods detailing different problems for selecting data to produce. We invest-
igate Active Learning for models that use the margin in model decisiveness to
measure the model uncertainty to guide data acquisition. We demonstrate that
the models perform better with Active Learning than with random acquisition
of data independent of machine learning model and starting knowledge. We
also study the multi-objective optimization problem of combinatorial library
design. Here we present a framework that could process the output of gener-
ative models for molecular design and give an optimized library design. The
results show that the framework successfully optimizes a library based on
molecule availability, for which the framework also attempts to identify using
retrosynthesis prediction. We conclude that the next step in intelligent data
acquisition is to combine the two methods and create a library design model
that use the information of previous libraries to guide subsequent designs.

determinantal point processes

generative models

machine learning

drug discovery

active learning

Cheminformatics

EE, EDIT-building, Rännvägen 6
Opponent: Prof. Andrea Volkamer, Saarland University, Germany

Författare

Simon Johansson

Chalmers, Data- och informationsteknik, Data Science och AI

Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction

Molecular Informatics,;Vol. In Press(2022)

Artikel i vetenskaplig tidskrift

Johansson, S.V., Chehreghani, M.H., Engkvist, O., Schliep, A., de novo generated combinatorial library design

Styrkeområden

Informations- och kommunikationsteknik

Hälsa och teknik

Ämneskategorier

Design

Annan kemi

Datavetenskap (datalogi)

Utgivare

Chalmers

EE, EDIT-building, Rännvägen 6

Online

Opponent: Prof. Andrea Volkamer, Saarland University, Germany

Mer information

Senast uppdaterat

2023-05-25