Ontology-based box embeddings and knowledge graphs for predicting phenotypic traits in Saccharomyces cerevisiae
Paper i proceeding, 2025

We present a method that uses graph neural networks (GNNs) to predict and interpret the effect of gene deletions in the yeast Saccharomyces cerevisiae from a knowledge graph (KG) with ontology-based box embeddings. We construct the KG from community databases using terms defined in several ontologies. From the class hierarchies in the ontologies, box embeddings are learnt as low dimensional representations of the nodes in the graph, which are used together with GNNs to predict cell growth for double gene knockouts from the KG. With this we show that high level qualitative information can be used to predict experimental data. Prediction performance was improved when using box embeddings of ontologies to represent the nodes in the graph, compared to learning features specific for this task. This suggests that class hierarchies in ontologies contain useful information about the domains, which can be extracted in the training of the box embeddings. We also demonstrate that our model can generalise beyond the task it was trained for by evaluating it on other types of genetic modifications. Additionally, we apply model interpretability techniques to identify co-occurring edges critical for predictions. Our findings are further validated by a biological experiment that reveals an association between inositol utilisation and osmotic stress resistance, emphasising the model’s potential to guide scientific discovery.

GNN

Bioinformatics

Knowledge graph

Box embeddings

Discovery science

Författare

Filip Kronström

Göteborgs universitet

Chalmers, Data- och informationsteknik, Data Science och AI

Daniel Brunnsåker

Göteborgs universitet

Chalmers, Data- och informationsteknik, Data Science och AI

Ievgeniia Tiukova

Chalmers, Life sciences, Infrastrukturer

Kungliga Tekniska Högskolan (KTH)

Ross King

Göteborgs universitet

Chalmers, Data- och informationsteknik, Data Science och AI

University of Cambridge

Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022

26403498 (eISSN)

Vol. 284

19th Conference on Neurosymbolic Learning and Reasoning, NeSy 2025
Santa Cruz, USA,

Ämneskategorier (SSIF 2025)

Bioinformatik (beräkningsbiologi)

Datavetenskap (datalogi)

Mer information

Senast uppdaterat

2025-11-06