From text to meaning: Semantic interpretation of non-standardized metadata in piping and instrumentation diagrams
Journal article, 2026

The extraction of structured metadata from Piping and Instrumentation Diagrams (P&IDs) is a major bottleneck for digitalization in the process industries. Existing methods, based on Optical Character Recognition (OCR), stop at raw text extraction, failing to interpret critical engineering information encoded within variable-format identifiers like pipeline numbers. This paper bridges this semantic gap by introducing a system for the format-aware interpretation of P&ID pipeline metadata. Our hybrid system architecture integrates deep learning for text recognition with domain interpretation rules that allow the system to adapt to new project formats without model retraining. These rules perform validation, error correction, and semantic mapping of raw text to structured data. We validated our system on a challenging dataset of real-world P&IDs from four distinct industrial projects, each with a unique and complex pipeline number format. Our method achieved 91.1% end-to-end accuracy, demonstrating a significant leap in performance over standard OCR tools, which proved insufficient for the task. This work presents a robust solution that unlocks valuable data from non-standardized engineering documents, providing a practical pathway for creating reliable digital twins and supporting plant lifecycle management in the chemical engineering sector.

Information extraction

Document analysis

Engineering drawings

Hybrid AI systems

Engineering automation

Author

Vasil Shteriyanov

Eindhoven University of Technology

McDermott

Rimman Dzhusupova

McDermott

Eindhoven University of Technology

Jan Bosch

University of Gothenburg

Eindhoven University of Technology

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Helena Holmström Olsson

Malmö university

Computers and Chemical Engineering

0098-1354 (ISSN)

Vol. 204 109436

Subject Categories (SSIF 2025)

Software Engineering

Computer Sciences

DOI

10.1016/j.compchemeng.2025.109436

More information

Latest update

10/17/2025