LLM-retrieval based scientific knowledge grounding
Paper i proceeding, 2025

The automated high-throughput laboratory offers unprecedented potential for scientific discovery, yet effectively linking studies to existing knowledge remains a significant challenge. As the general body of scientific knowledge grows, so too does the burden of contextualizing a new experiment. While ontologies and databases serve as structured common repositories, their rigid schemas are often incompatible with the unstructured or semi-structured formats of most laboratories. In this study we investigate the integration of large language models (LLMs) with ontology-based vector databases to anchor semi-structured scientific experiments into knowledge bases via automated retrieval. Our approach extracts scientific entities from unstructured experimental texts, and grounds them to relevant ontology terms. We automate knowledge grounding, which enhances the integration of unstructured experimental data into established formal scientific languages. We have tested our method on a diverse selection of experimental yeast biology papers focused on Saccharomyces cerevisiae, a foundational model system that has driven major discoveries in molecular and cellular biology, and observed strong pipeline performance. We argue that such a knowledge grounding approach is a critical component for the new wave of efficient artificial intelligence (AI) driven automated laboratories that integrate LLMs with high-throughput experimentation and data-driven discovery.

Knowledge Engineering

Saccharomyces cerevisiae

Information Extraction for RKGs/SKGs

Large Language Models

Artificial Intelligence

Ontologies

Författare

Gabriel K. Reder

University of Cambridge

Carl Collins

Goldsmiths, University of London

Abbi Abdel Rehim

University of Cambridge

Larisa N. Soldatova

Goldsmiths, University of London

Ross King

Göteborgs universitet

University of Cambridge

Chalmers, Data- och informationsteknik, Data Science och AI

Alan Turing Institute

CEUR Workshop Proceedings

16130073 (ISSN)

Vol. 3977

Joint of the ESWC 2025 Workshops and Tutorials, ESWC-JP 2025
Portoroz, Slovenia,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Mer information

Senast uppdaterat

2025-07-03