From text to meaning: Semantic interpretation of non-standardized metadata in piping and instrumentation diagrams
Artikel i vetenskaplig tidskrift, 2026

The extraction of structured metadata from Piping and Instrumentation Diagrams (P&IDs) is a major bottleneck for digitalization in the process industries. Existing methods, based on Optical Character Recognition (OCR), stop at raw text extraction, failing to interpret critical engineering information encoded within variable-format identifiers like pipeline numbers. This paper bridges this semantic gap by introducing a system for the format-aware interpretation of P&ID pipeline metadata. Our hybrid system architecture integrates deep learning for text recognition with domain interpretation rules that allow the system to adapt to new project formats without model retraining. These rules perform validation, error correction, and semantic mapping of raw text to structured data. We validated our system on a challenging dataset of real-world P&IDs from four distinct industrial projects, each with a unique and complex pipeline number format. Our method achieved 91.1% end-to-end accuracy, demonstrating a significant leap in performance over standard OCR tools, which proved insufficient for the task. This work presents a robust solution that unlocks valuable data from non-standardized engineering documents, providing a practical pathway for creating reliable digital twins and supporting plant lifecycle management in the chemical engineering sector.

Information extraction

Document analysis

Engineering drawings

Hybrid AI systems

Engineering automation

Författare

Vasil Shteriyanov

Technische Universiteit Eindhoven

McDermott

Rimman Dzhusupova

McDermott

Technische Universiteit Eindhoven

Jan Bosch

Göteborgs universitet

Technische Universiteit Eindhoven

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Helena Holmström Olsson

Malmö universitet

Computers and Chemical Engineering

0098-1354 (ISSN)

Vol. 204 109436

Ämneskategorier (SSIF 2025)

Programvaruteknik

Datavetenskap (datalogi)

DOI

10.1016/j.compchemeng.2025.109436

Mer information

Senast uppdaterat

2025-10-17