Enhancing OCR-based Engineering Diagram Analysis by Integrating Diverse External Legends with VLMs
Artikel i vetenskaplig tidskrift, 2025

Manual analysis of diagrams and legend sheets in engineering projects is time consuming and needs automation. The lack of standardized legend formats complicates creating a general method for automated information extraction. Existing approaches require training and custom rules for each project. This study proposes a novel solution combining optical character recognition with vision language models and multimodal prompt engineering to automate information extraction from diverse legend sheets without training. It integrates legend information with information extracted from diagrams, unlike studies that only focus on diagrams. Our study shows that VLMs, guided by multimodal prompts, can accurately extract information from diverse legend sheets, enabling automatic information extraction in diagrams across engineering projects. We validate our method through a case study involving the extraction of instruments from piping and instrumentation diagrams (P&IDs) and their legends across three projects with varied formats and standards. The proposed method achieved 100% accuracy in legend classification and information extraction, and 99.68% precision and 95.91% recall in generating instrument listings. The results demonstrate the effectiveness of our approach, significantly enhancing the accuracy and efficiency of information extraction from diagrams. This method can be adapted to different legend formats and diagrams, providing a versatile solution for various industries.

diagrams

information extraction

vision language models

optical character recognition

multimodal prompt engineering

legends

Författare

Vasil Shteriyanov

Technische Universiteit Eindhoven

McDermott

Rimman Dzhusupova

Technische Universiteit Eindhoven

McDermott

Jan Bosch

Göteborgs universitet

Technische Universiteit Eindhoven

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Helena Holmström Olsson

Malmö universitet

Journal of Software: Evolution and Process

2047-7481 (eISSN)

Vol. 37 12 e70072

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

DOI

10.1002/smr.70072

Mer information

Senast uppdaterat

2025-12-23