Simsax: A measure of project similarity based on symbolic approximation method and software defect inflow
Artikel i vetenskaplig tidskrift, 2019
Background: Profiling software development projects, in order to compare them, find similar sub-projects or sets of activities, helps to monitor changes in software processes. Since we lack objective measures for profiling or hashing, researchers often fall back on manual assessments.
Objective: The goal of our study is to define an objective and intuitive measure of similarity between software development projects based on software defect-inflow profiles.
Method: We defined a measure of project similarity called SimSAX which is based on segmentation of defect-inflow profiles, coding them into strings (sequences of symbols) and comparing these strings to find so-called motifs. We use simulations to find and calibrate the parameters of the measure. The objects in the simulations are two different large industry projects for which we know the similarity a priori, based on the input from industry experts. Finally, we apply the measure to find similarities between five industrial and six open source projects.
Results: Our results show that the measure provides the most accurate simulated results when the compared motifs are long (32 or more weeks) and we use an alphabet of 5 or more symbols. The measure provides the possibility to calibrate for each industrial case, thus allowing to optimize the method for finding specific patterns in project similarity.
Conclusions: We conclude that our proposed measure provides a good approximation for project similarity. The industrial evaluation showed that it can provide a good starting point for finding similar periods in software development projects.