An extensive dataset of UML models in GitHub
Paper i proceeding, 2017

The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them. We present a semi-Automated approach to collect UML stored in images,.xmi, and.uml files. We offer a dataset with over 93,000 UML diagrams from over 24,000 projects in GitHub.

mining software repositories

dataset

modeling

GitHub

UML

Författare

G. Robles

Truong Ho Quang

Göteborgs universitet

Regina Hebig

Göteborgs universitet

Michel Chaudron

Göteborgs universitet

M.A. Fernandez

IEEE International Working Conference on Mining Software Repositories

2160-1860 (eISSN)

519-522

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier

Programvaruteknik

DOI

10.1109/MSR.2017.48