Evaluating the layout quality of UML class diagrams using machine learning
Journal article, 2022

UML is the de facto standard notation for graphically representing software. UML diagrams are used in the analysis, construction, and maintenance of software systems. Mostly, UML diagrams capture an abstract view of a (piece of a) software system. A key purpose of UML diagrams is to share knowledge about the system among developers. The quality of the layout of UML diagrams plays a crucial role in their comprehension. In this paper, we present an automated method for evaluating the layout quality of UML class diagrams. We use machine learning based on features extracted from the class diagram images using image processing. Such an automated evaluator has several uses: (1) From an industrial perspective, this tool could be used for automated quality assurance for class diagrams (e.g., as part of a quality monitor integrated into a DevOps toolchain). For example, automated feedback can be generated once a UML diagram is checked in the project repository. (2) In an educational setting, the evaluator can grade the layout aspect of student assignments in courses on software modeling, analysis, and design. (3) In the field of algorithm design for graph layouts, our evaluator can assess the layouts generated by such algorithms. In this way, this evaluator opens up the road for using machine learning to learn good layouting algorithms. Approach.: We use machine learning techniques to build (linear) regression models based on features extracted from the class diagram images using image processing. As ground truth, we use a dataset of 600+ UML Class Diagrams for which experts manually label the quality of the layout. Contributions.: This paper makes the following contributions: (1) We show the feasibility of the automatic evaluation of the layout quality of UML class diagrams. (2) We analyze which features of UML class diagrams are most strongly related to the quality of their layout. (3) We evaluate the performance of our layout evaluator. (4) We offer a dataset of labeled UML class diagrams. In this dataset, we supply for every diagram the following information: (a) a manually established ground truth of the quality of the layout, (b) an automatically established value for the layout-quality of the diagram (produced by our classifier), and (c) the values of key features of the layout of the diagram (obtained by image processing). This dataset can be used for replication of our study and others to build on and improve on this work. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

Quality of layout

Machine learning

Quality of UML class diagrams

Author

Gustav Bergström

University of Gothenburg

Fadhl Mohammad Omar Hujainah

Volvo Cars

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Truong Ho-Quang

Volvo Cars

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Rodi Jolak

University of Gothenburg

Volvo Cars

Satrio Adi Rukmono

Institut Teknologi Bandung

Eindhoven University of Technology

Arif Nurwidyantoro

Monash University

Michel Chaudron

Eindhoven University of Technology

University of Gothenburg

Journal of Systems and Software

0164-1212 (ISSN)

Vol. 192 111413

Subject Categories

Production Engineering, Human Work Science and Ergonomics

Other Computer and Information Science

Reliability and Maintenance

Software Engineering

Information Science

DOI

10.1016/j.jss.2022.111413

Related datasets

Replication Package for "Evaluating the layout quality of UML class diagrams using machine learning" [dataset]

DOI: 10.5281/zenodo.6645684

More information

Latest update

9/21/2023