Improving Data Quality for Regression Test Selection by Reducing Annotation Noise
Paper i proceeding, 2020

Big data and machine learning models have been increasingly used to support software engineering processes and practices. One example is the use of machine learning models to improve test case selection in continuous integration. However, one of the challenges in building such models is the identification and reduction of noise that often comes in large data. In this paper, we present a noise reduction approach that deals with the problem of contradictory training entries. We empirically evaluate the effectiveness of the approach in the context of selective regression testing. For this purpose, we use a curated training set as input to a tree-based machine learning ensemble and compare the classification precision, recall, and f-score against a non-curated set. Our study shows that using the noise reduction approach on the training instances gives better results in prediction with an improvement of 37% on precision, 70% on recall, and 59% on f-score.

Machine Learning Models

Regression Testing

Annotation Noise

Författare

Khaled Al Sabbagh

Göteborgs universitet

Miroslaw Staron

Göteborgs universitet

Regina Hebig

Göteborgs universitet

Wilhelm Meding

Ericsson AB

Proceedings - 46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020

191-194 9226358

46th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020
Kranj, Slovenia,

Ämneskategorier

Annan data- och informationsvetenskap

Språkteknologi (språkvetenskaplig databehandling)

Bioinformatik (beräkningsbiologi)

DOI

10.1109/SEAA51224.2020.00042

Mer information

Senast uppdaterat

2022-01-11