Automatic classification of UML Class diagrams from images
Paper in proceeding, 2014

- Graphical modelling of various aspects of software and systems is a common part of software development. UML is the de-facto standard for various types of software models. To be able to research UML, academia needs to have a corpus of UML models. For building such a database, an automated system that has the ability to classify UML class diagram images would be very beneficial, since a large portion of UML class diagrams (UML CDs) is available as images on the Internet. In this study, we propose 23 image-features and investigate the use of these features for the purpose of classifying UML CD images. We analyse the performance of the features and assess their contribution based on their Information Gain Attribute Evaluation scores. We study specificity and sensitivity scores of six classification algorithms on a set of 1300 images. We found that 19 out of 23 introduced features can be considered as influential predictors for classifying UML CD images. Through the six algorithms, the prediction rate achieves nearly 96% correctness for UML-CD and 91% of correctness for non-UML CD.

Software Engineering

UML class diagram

Feature extraction


Machine learning



Truong Ho Quang

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Michel Chaudron

University of Gothenburg

Ingimar Samúelsson

University of Gothenburg

Jóel Hjaltason

University of Gothenburg

B. Karasneh

Leiden University

H. Osman

Leiden University

Proceedings - Asia-Pacific Software Engineering Conference, APSEC

15301362 (ISSN)

Vol. 1 399-406
978-1-4799-7425-2 (ISBN)

Subject Categories

Computer Vision and Robotics (Autonomous Systems)





More information

Latest update

1/3/2024 9