Automatic classification of UML Class diagrams from images
Paper in proceedings, 2014
- Graphical modelling of various aspects of software and systems is a common part of software development. UML is the de-facto standard for various types of software models. To be able to research UML, academia needs to have a corpus of UML models. For building such a database, an automated system that has the ability to classify UML class diagram images would be very beneficial, since a large portion of UML class diagrams (UML CDs) is available as images on the Internet. In this study, we propose 23 image-features and investigate the use of these features for the purpose of classifying UML CD images. We analyse the performance of the features and assess their contribution based on their Information Gain Attribute Evaluation scores. We study specificity and sensitivity scores of six classification algorithms on a set of 1300 images. We found that 19 out of 23 introduced features can be considered as influential predictors for classifying UML CD images. Through the six algorithms, the prediction rate achieves nearly 96% correctness for UML-CD and 91% of correctness for non-UML CD.
UML class diagram