Automatic classification of UML Class diagrams from images
Paper i proceeding, 2014

- Graphical modelling of various aspects of software and systems is a common part of software development. UML is the de-facto standard for various types of software models. To be able to research UML, academia needs to have a corpus of UML models. For building such a database, an automated system that has the ability to classify UML class diagram images would be very beneficial, since a large portion of UML class diagrams (UML CDs) is available as images on the Internet. In this study, we propose 23 image-features and investigate the use of these features for the purpose of classifying UML CD images. We analyse the performance of the features and assess their contribution based on their Information Gain Attribute Evaluation scores. We study specificity and sensitivity scores of six classification algorithms on a set of 1300 images. We found that 19 out of 23 introduced features can be considered as influential predictors for classifying UML CD images. Through the six algorithms, the prediction rate achieves nearly 96% correctness for UML-CD and 91% of correctness for non-UML CD.

Software Engineering

UML class diagram

Feature extraction


Machine learning



Truong Ho Quang

Chalmers, Data- och informationsteknik, Software Engineering

Michel Chaudron

Göteborgs universitet

Ingimar Samúelsson

Göteborgs universitet

Jóel Hjaltason

Göteborgs universitet

B. Karasneh

Universiteit Leiden

H. Osman

Universiteit Leiden

Proceedings of the 21st Asia-Pacific Software Engineering Conference, APSEC 2014

1530-1362 (ISSN)

Vol. 1 399-406
978-1-4799-7425-2 (ISBN)


Datorseende och robotik (autonoma system)





Mer information

Senast uppdaterat