An automated approach for classifying reverse-engineered and forward-engineered UML class diagrams
Paper i proceeding, 2018

UML Class diagrams are commonly used to describe the designs of systems. Such designs can be used to guide the construction of software. In practice, we have identified two main types of using UML: i) FwCD refers to diagrams are hand-made as part of the forward-looking development process; ii) RECD refers to those diagrams that are reverse engineered from the source code; Recently, empirical studies in Software Engineering have started looking at open source projects. This enables the automated extraction and analysis of large sets of project-data. For researching the effects of UML modeling in open source projects, we need a way to automatically determine the way in which UML used in such projects. For this, we propose an automated classifier for deciding whether a diagram is an FwCD or an RECD. We present the construction of such a classifier by means of (supervised) machine learning algorithms. As part of its construction, we analyse which features are useful in classifying FwCD and RECD. By comparing different machine learning algorithms, we find that the Random Forest algorithm is the most suitable algorithm for our purpose. We evaluate the performance of the classifier on a test set of 999 class diagrams obtained from open source projects.

Machine learning

Reverse engineering

Software engineering

Unified modeling language


M.H. Osman

Universiti Putra Malaysia

Technische Universität München

Truong Ho-Quang

Chalmers, Data- och informationsteknik, Software Engineering

Michel Chaudron

Göteborgs universitet

Proceedings - 44th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2018

396-399 8498237
9781538673829 (ISBN)

44th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2018
Prag, Czech Republic,


Språkteknologi (språkvetenskaplig databehandling)


Datavetenskap (datalogi)



Mer information

Senast uppdaterat