Improving Data Quality for Regression Test Selection by Reducing Annotation Noise
Paper in proceeding, 2020

Big data and machine learning models have been increasingly used to support software engineering processes and practices. One example is the use of machine learning models to improve test case selection in continuous integration. However, one of the challenges in building such models is the identification and reduction of noise that often comes in large data. In this paper, we present a noise reduction approach that deals with the problem of contradictory training entries. We empirically evaluate the effectiveness of the approach in the context of selective regression testing. For this purpose, we use a curated training set as input to a tree-based machine learning ensemble and compare the classification precision, recall, and f-score against a non-curated set. Our study shows that using the noise reduction approach on the training instances gives better results in prediction with an improvement of 37% on precision, 70% on recall, and 59% on f-score.

Regression Testing

Machine Learning Models

Annotation Noise

Author

Khaled Al Sabbagh

University of Gothenburg

Miroslaw Staron

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers), Software Engineering for Cyber Physical Systems

Regina Hebig

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers), Software Engineering for Testing, Requirements, Innovation and Psychology

Wilhelm Meding

Ericsson

Proceedings - 46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020

191-194 9226358

46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020
Kranj, Slovenia,

Subject Categories

Other Computer and Information Science

Language Technology (Computational Linguistics)

Bioinformatics (Computational Biology)

DOI

10.1109/SEAA51224.2020.00042

More information

Latest update

8/18/2021