An empirical study of manual abstraction between class diagrams and code of open-source systems
Journal article, 2025

Models play a crucial role in software design, analysis, and supporting new maintainers. However, over time, the benefits of models can diminish as system implementations evolve without corresponding updates to the original models. Reverse engineering methods and tools can help maintain alignment between models and implementation code. Yet, automatically reverse-engineered models often lack abstraction and contain extensive details that hinder comprehension. Recent advancements in AI-based content generation suggest that we may soon see reverse engineering tools capable of human-grade abstraction. To guide the design and validation of such tools, we need a principled understanding of manual abstraction-a topic that has received limited attention in existing literature. In pursuit of this goal, our paper presents a multiple-case study of model-to-code differences, examining nine substantial open-source software projects obtained through repository mining. We manually matched source code from projects comprising 4983 classes, 26k attributes, and 54k operations to 523 model elements (including classes, attributes, operations, and relationships). These mappings precisely capture discrepancies between provided class diagram designs and actual implementation code. By analyzing these differences in detail, we derive a taxonomy of difference types and provide a well-organized list of cases corresponding to identified differences. Our findings have the potential to contribute to improved reverse engineering methods and tools, propose new mapping rules for model-to-code consistency checks, and offer guidelines to avoid over-abstraction and over-specification during the design process.

Modeling

Software design

Abstraction

Author

Weixing Zhang

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

University of Gothenburg

Weixing Zhang

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

University of Gothenburg

Daniel Strüber

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Regina Hebig

University of Rostock

Software and Systems Modeling

1619-1366 (ISSN) 1619-1374 (eISSN)

Vol. In Press

Subject Categories (SSIF 2025)

Software Engineering

Computer Sciences

DOI

10.1007/s10270-025-01289-y

More information

Latest update

5/23/2025