Knowledge Models and Inference Frameworks for Scientific Discovery
Doktorsavhandling, 2026

Scientific discovery is an active process of designing, testing, and improving theories about the natural world. Automating this process is a grand challenge for 21st century science. This thesis examines scientific inquiry as it relates to machine learning, offering contributions to knowledge representations and reasoning frameworks, demonstrated in systems biology.

Systems biology is an integrationist approach to biological science, meaning organisms are treated as complex systems whose behaviour is dictated by the interaction of their constituent parts. Eukaryotic organisms are extremely complex, and research progress in systems biology can be slow. Recent advances in robotics and artificial intelligence (AI) offer great opportunity for automating scientific discovery in this field. Using the model organism Saccharomyces cerevisiae (baker’s yeast), this thesis explores: the philosophical motivations for automation in biological research; knowledge models and hypotheses in systems biology; and computational models of metabolism.

The first main contribution is a first-order logic framework for modelling cellular physiology, which enables abduction of hypotheses for improvement of knowledge models, using the automated theorem prover (ATP) iProver. The second contribution is an ontology for describing theory changes and hypotheses in a semantic and storage-efficient manner. The third main contribution is an application of graph neural networks (GNNs) to learn knowledge graph embeddings grounded in empirical data and ontology structures. The final contribution is an end-to-end demonstration of autonomous hypothesis generation and experimentation, with hypotheses modelled using ontology terms to support large language model (LLM) agents and human scientists.

These contributions demonstrate the power of knowledge graphs for autonomous scientific discovery. This thesis also argues that scientific discovery is better modelled as supervised learning—specifically active learning for AI scientists—than reinforcement learning; mapping concepts from machine learning algorithms to the domain produces systems that align with established scientific values, leading to improved theories.

automated theorem provers

systems biology

knowledge modelling

machine learning

scientific discovery

Artificial intelligence

abduction

ontologies

EDIT Lecture Hall EF
Opponent: Professor Jan Komorowski, Uppsala University, Sweden

Författare

Alexander Gower

Chalmers, Data- och informationsteknik, Data Science och AI

Scientific discovery follows a cycle: starting from what you know, form a hypothesis, test it with an experiment, and use the result to improve your knowledge. This is also a description of active learning, a branch of machine learning in which algorithms select what to test next based on what they have already learned. Recognising this connection is powerful. It means that the design of AI systems for science can draw directly on machine learning research.

For a machine to do science, its knowledge must be stored in a form it can reason about—precise enough for a computer, and meaningful enough for human scientists. It must also be given tools to use its knowledge to reason about natural phenomena.

Machines and human scientists have complementary strengths. Machines do not tire, can run many experiments in parallel, are highly consistent, and have powerful reasoning tools at their disposal, not least due to advances in artificial intelligence (AI). Humans bring creativity, intuition, ethical values, and a bodily experience of the world. Together with machines, we can do better science than either could alone, and more of it.

This thesis explores these ideas in the study of baker's yeast—Saccharomyces cerevisiae. New methods are developed for storing biological knowledge in structured, machine-readable forms, and a robot scientist is demonstrated that autonomously designs and runs real laboratory experiments.

Ämneskategorier (SSIF 2025)

Bioinformatik (beräkningsbiologi)

Datavetenskap (datalogi)

Fundament

Grundläggande vetenskaper

Infrastruktur

Chalmers e-Commons (inkl. C3SE, 2020-)

DOI

10.63959/chalmers.dt/5896

ISBN

978-91-8103-439-4

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5896

Utgivare

Chalmers

EDIT Lecture Hall EF

Online

Opponent: Professor Jan Komorowski, Uppsala University, Sweden

Mer information

Senast uppdaterat

2026-05-13