Enhancing OCR-based Engineering Diagram Analysis by Integrating Diverse External Legends with VLMs
Journal article, 2025

Manual analysis of diagrams and legend sheets in engineering projects is time consuming and needs automation. The lack of standardized legend formats complicates creating a general method for automated information extraction. Existing approaches require training and custom rules for each project. This study proposes a novel solution combining optical character recognition with vision language models and multimodal prompt engineering to automate information extraction from diverse legend sheets without training. It integrates legend information with information extracted from diagrams, unlike studies that only focus on diagrams. Our study shows that VLMs, guided by multimodal prompts, can accurately extract information from diverse legend sheets, enabling automatic information extraction in diagrams across engineering projects. We validate our method through a case study involving the extraction of instruments from piping and instrumentation diagrams (P&IDs) and their legends across three projects with varied formats and standards. The proposed method achieved 100% accuracy in legend classification and information extraction, and 99.68% precision and 95.91% recall in generating instrument listings. The results demonstrate the effectiveness of our approach, significantly enhancing the accuracy and efficiency of information extraction from diagrams. This method can be adapted to different legend formats and diagrams, providing a versatile solution for various industries.

diagrams

information extraction

vision language models

optical character recognition

multimodal prompt engineering

legends

Author

Vasil Shteriyanov

Eindhoven University of Technology

McDermott

Rimman Dzhusupova

Eindhoven University of Technology

McDermott

Jan Bosch

University of Gothenburg

Eindhoven University of Technology

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Helena Holmström Olsson

Malmö university

Journal of Software: Evolution and Process

2047-7481 (eISSN)

Vol. 37 12 e70072

Subject Categories (SSIF 2025)

Computer graphics and computer vision

DOI

10.1002/smr.70072

More information

Latest update

12/23/2025