Improving Code Smell Predictions in Continuous Integration by Differentiating Organic from Cumulative Measures
Paper i proceeding, 2019
Continuous integration and deployment are enablers of quick innovation cycles of software and systems through incremental releases of a product within short periods of time. If software qualities can be predicted for the next release, quality managers can plan ahead with resource allocation for concerning issues. Cumulative metrics are observed to have much higher correlation coefficients compared to non-cumulative metrics. Given the difference in correlation coefficients of cumulative and noncumulative metrics, this study investigates the difference between metrics of these two categories concerning the correctness of predicting code smell which is internal software quality. This study considers 12 metrics from each measurement category, and 35 code smells collected from 36,217 software revisions (commits) of 242 open source Java projects. We build 8,190 predictive models and evaluate them to determine how measurement categories of predictors and targets affect model accuracies predicting code smells. To further validate our approach, we compared our results with Principal Component Analysis (PCA), a statistical procedure for dimensionality reduction. Results of the study show that within the context of continuous integration, non-cumulative metrics as predictors build better predictive models with respect to model accuracy compared to cumulative metrics. When the results are compared with models built from extracted PCA components, we found better results using our approach.
effects of measurement types
principal component analysis