Improving test case selection by handling class and attribute noise
Artikel i vetenskaplig tidskrift, 2022

Big data and machine learning models have been increasingly used to support software engineering processes and practices. One example is the use of machine learning models to improve test case selection in continuous integration. However, one of the challenges in building such models is the large volume of noise that comes in data, which impedes their predictive performance. In this paper, we address this issue by studying the effect of two types of noise, called class and attribute, on the predictive performance of a test selection model. For this purpose, we analyze the effect of class noise by using an approach that relies on domain knowledge for relabeling contradictory entries and removing duplicate ones. Thereafter, an existing approach from the literature is used to experimentally study the effect of attribute noise removal on learning. The analysis results show that the best learning is achieved when training a model on class-noise cleaned data only - irrespective of attribute noise. Specifically, the learning performance of the model reported 81% precision, 87% recall, and 84% f-score compared with 44% precision, 17% recall, and 25% f-score for a model built on uncleaned data. Finally, no causality relationship between attribute noise removal and the learning of a model for test case selection was drawn. (C) 2021 The Author(s). Published by Elsevier Inc.

Författare

Khaled Al Sabbagh

Göteborgs universitet

Software Engineering 1

Miroslaw Staron

Chalmers, Data- och informationsteknik, Software Engineering

Göteborgs universitet

Regina Hebig

Göteborgs universitet

Software Engineering 1

Journal of Systems and Software

0164-1212 (ISSN)

Vol. 183 111093

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

DOI

10.1016/j.jss.2021.111093

Mer information

Senast uppdaterat

2025-06-27