An extensive dataset of UML models in GitHub
Paper in proceedings, 2017
The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them. We present a semi-Automated approach to collect UML stored in images,.xmi, and.uml files. We offer a dataset with over 93,000 UML diagrams from over 24,000 projects in GitHub.
mining software repositories