An extensive dataset of UML models in GitHub
Paper in proceedings, 2017

The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them. We present a semi-Automated approach to collect UML stored in images,.xmi, and.uml files. We offer a dataset with over 93,000 UML diagrams from over 24,000 projects in GitHub.

mining software repositories

dataset

modeling

GitHub

UML

Author

G. Robles

Truong Ho Quang

University of Gothenburg

Regina Hebig

University of Gothenburg

Michel Chaudron

University of Gothenburg

M.A. Fernandez

IEEE International Working Conference on Mining Software Repositories

2160-1860 (eISSN)

519-522

Areas of Advance

Information and Communication Technology

Subject Categories

Software Engineering

DOI

10.1109/MSR.2017.48