An automated approach for classifying reverse-engineered and forward-engineered UML class diagrams
Paper i proceeding, 2018

UML Class diagrams are commonly used to describe the designs of systems. Such designs can be used to guide the construction of software. In practice, we have identified two main types of using UML: i) FwCD refers to diagrams are hand-made as part of the forward-looking development process; ii) RECD refers to those diagrams that are reverse engineered from the source code; Recently, empirical studies in Software Engineering have started looking at open source projects. This enables the automated extraction and analysis of large sets of project-data. For researching the effects of UML modeling in open source projects, we need a way to automatically determine the way in which UML used in such projects. For this, we propose an automated classifier for deciding whether a diagram is an FwCD or an RECD. We present the construction of such a classifier by means of (supervised) machine learning algorithms. As part of its construction, we analyse which features are useful in classifying FwCD and RECD. By comparing different machine learning algorithms, we find that the Random Forest algorithm is the most suitable algorithm for our purpose. We evaluate the performance of the classifier on a test set of 999 class diagrams obtained from open source projects.

Reverse engineering

Unified modeling language

Software engineering

Machine learning

Författare

M.H. Osman

Technische Universität München

Universiti Putra Malaysia

Truong Ho-Quang

Chalmers

Michel Chaudron

Chalmers, Data- och informationsteknik, Software Engineering, Software Engineering for Cyber Physical Systems

Proceedings - 44th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2018

396-399 8498237

44th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2018
Prag, Czech Republic,

Ämneskategorier

Språkteknologi (språkvetenskaplig databehandling)

Programvaruteknik

Datavetenskap (datalogi)

DOI

10.1109/SEAA.2018.00070

Mer information

Senast uppdaterat

2021-01-18