The Impact of Class Noise-handling on the Effectiveness of Machine Learning-based Methods for Build Outcome and Code Change Request Predictions

Khaled Al Sabbagh; Miroslaw Staron; Regina Hebig

doi:10.1145/3764864

The Impact of Class Noise-handling on the Effectiveness of Machine Learning-based Methods for Build Outcome and Code Change Request Predictions
Journal article, 2026

Machine learning-based methods are increasingly used to optimize build processes and accelerate the integration of software code. These methods leverage large volumes of historical code changes to train models on predicting and preventing issues in the codebase that could delay code integrations and features delivery to end-users. The objective of this study is to examine the impact of handling class noise present in software code changes collected from Continuous Integration (CI) systems on the predictive performance of machine learning models for predicting the execution outcome of CI builds and negative code reviews. In this study, we conduct a series of computational experiments using data from 110 Java open-source projects, examining the effectiveness of two removal-based statistical techniques - Majority Filter (MF) and Consensus Filter (CF) - and two corrective techniques - Domain Knowledge-based (DB) and CleanLab. Our results show that removal-based techniques significantly improve model predictive performance in both build outcome and negative code review prediction tasks. For build outcome prediction, applying MF increased the F1-score from 82% to 97%, and MCC from 0.13 to 0.58. In negative code review predictions, MF improved the F1-score from 17% to 53%, and MCC from −0.03 to 0.57. The DB technique was effective primarily in the context of code review comments but less so for build outcome predictions. While CleanLab yielded more consistent predictions, its overall impact on model performance was more moderate compared to removal-based techniques. Additionally, our findings show that hyperparameter tuning, applied independently or in combination with CleanLab, can further improve model performance; however, these gains did not surpass those achieved by removal-based techniques alone. We conclude that applying removal-based techniques to the training data of code changes is necessary to improve the prediction of build outcomes and negative code review comments.

Author

Khaled Al Sabbagh

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Other publications Research

Miroslaw Staron

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)

Other publications Research

Regina Hebig

University of Rostock

Other publications Research

ACM Transactions on Software Engineering and Methodology

1049-331X (ISSN) 15577392 (eISSN)

Vol. 35 6 147

Subject Categories (SSIF 2025)

Software Engineering

Computer Sciences

Computer Systems

DOI

10.1145/3764864

Publication data connected to DOI

More information

Latest update

6/22/2026

The Impact of Class Noise-handling on the Effectiveness of Machine Learning-based Methods for Build Outcome and Code Change Request Predictions Journal article, 2026

Author

Khaled Al Sabbagh

Miroslaw Staron

Regina Hebig

ACM Transactions on Software Engineering and Methodology

Subject Categories (SSIF 2025)

DOI

More information

Latest update

The Impact of Class Noise-handling on the Effectiveness of Machine Learning-based Methods for Build Outcome and Code Change Request Predictions
Journal article, 2026