Intelligent data acquisition for drug design through combinatorial library design
Licentiate thesis, 2023
need for standardized data. Methods and interest exist for producing new data
but due to material and budget constraints it is desirable that each iteration of
producing data is as efficient as possible. In this thesis, we present two papers
methods detailing different problems for selecting data to produce. We invest-
igate Active Learning for models that use the margin in model decisiveness to
measure the model uncertainty to guide data acquisition. We demonstrate that
the models perform better with Active Learning than with random acquisition
of data independent of machine learning model and starting knowledge. We
also study the multi-objective optimization problem of combinatorial library
design. Here we present a framework that could process the output of gener-
ative models for molecular design and give an optimized library design. The
results show that the framework successfully optimizes a library based on
molecule availability, for which the framework also attempts to identify using
retrosynthesis prediction. We conclude that the next step in intelligent data
acquisition is to combine the two methods and create a library design model
that use the information of previous libraries to guide subsequent designs.
determinantal point processes
generative models
machine learning
drug discovery
active learning
Cheminformatics
Author
Simon Johansson
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Using Active Learning to Develop Machine Learning Models for Reaction Yield Prediction
Molecular Informatics,;Vol. In Press(2022)
Journal article
Johansson, S.V., Chehreghani, M.H., Engkvist, O., Schliep, A., de novo generated combinatorial library design
Areas of Advance
Information and Communication Technology
Health Engineering
Subject Categories
Design
Other Chemistry Topics
Computer Science
Publisher
Chalmers