Graph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Prediction
Preprint, 2026
Our yeast KG is constructed from community databases and ontology terms. Low-dimensional box embeddings combined with GNNs are used to predict cell growth for double gene knockouts. Over 10-fold cross validation, these predictions have a mean R2 score of 0.360, significantly higher than baseline comparisons, demonstrating that high-level qualitative knowledge is informative about experimental outcomes. Incorporating semantic loss terms in the training of the models improves their predictive performance (R2=0.377) by aligning embeddings with ontology structure. This shows that class hierarchies from ontologies can be exploited for quantitative prediction. We also test the trained models on triple gene knockouts, showing they generalise to data beyond those seen in training.
Additionally, by identifying co-occurring relations in the yeast KG important for the cell-growth predictions, we construct hypotheses about interacting traits in yeast. A biological experiment validates one such finding, revealing an association between inositol utilisation and osmotic stress resistance, highlighting the model's potential to guide biological discovery.
Författare
Filip Kronström
Chalmers, Data- och informationsteknik, Data Science och AI
Alexander Gower
Chalmers, Data- och informationsteknik, Data Science och AI
Daniel Brunnsåker
Chalmers, Data- och informationsteknik, Data Science och AI
Ievgeniia Tiukova
Chalmers, Life sciences, Infrastrukturer
Ross King
Chalmers, Data- och informationsteknik, Data Science och AI
Ämneskategorier (SSIF 2025)
Bioinformatik (beräkningsbiologi)
Bioinformatik och beräkningsbiologi
Datavetenskap (datalogi)
DOI
10.48550/arXiv.2605.03690