The Effect of Class Noise on Continuous Test Case Selection: A Controlled Experiment on Industrial Data
Paper in proceeding, 2020
Continuous integration and testing produce a large amount of data about defects in code revisions, which can be utilized for training a predictive learner to effectively select a subset of test suites. One challenge in using predictive learners lies in the noise that comes in the training data, which often leads to a decrease in classification performances. This study examines the impact of one type of noise, called class noise, on a learner’s ability for selecting test cases. Understanding the impact of class noise on the performance of a learner for test case selection would assist testers decide on the appropriateness of different noise handling strategies. For this purpose, we design and implement a controlled experiment using an industrial data-set to measure the impact of class noise at six different levels on the predictive performance of a learner. We measure the learning performance using the Precision, Recall, F-score, and Mathew Correlation Coefficient (MCC) metrics. The results show a statistically significant relationship between class noise and the learners performance for test case selection. Particularly, a significant difference between the three performance measures (Precision, F-score, and MCC)under all the six noise levels and at 0% level was found, whereas a similar relationship between recall and class noise was found at a level above30%. We conclude that higher class noise ratios lead to missing out more tests in the predicted subset of test suite and increases the rate of false alarms when the class noise ratio exceeds 30%
Controlled experiment·Class noise·Test case selection·Continuous integration