Selective Regression Testing based on Big Data: Comparing Feature Extraction Techniques
Paper in proceeding, 2020

Regression testing is a necessary activity in continuous integration (CI) since it provides confidence that modified parts of the system are correct at each integration cycle. CI provides large volumes of data which can be used to support regression testing activities. By using machine learning, patterns about faulty changes in the modified program can be induced, allowing test orchestrators to make inferences about test cases that need to be executed at each CI cycle. However, one challenge in using learning models lies in finding a suitable way for characterizing source code changes and preserving important information. In this paper, we empirically evaluate the effect of three feature extraction algorithms on the performance of an existing ML-based selective regression testing technique. We designed and performed an experiment to empirically investigate the effect of Bag of Words (BoW), Word Embeddings (WE), and content-based feature extraction (CBF). We used stratified cross validation on the space of features generated by the three FE techniques and evaluated the performance of three machine learning models using the precision and recall metrics. The results from this experiment showed a significant difference between the models' precision and recall scores, suggesting that the BoW-fed model outperforms the other two models with respect to precision, whereas a CBF-fed model outperforms the rest with respect to recall.

Machine Learning

Feature Extraction

Regression Testing

Continuous Integration


Khaled Al Sabbagh

University of Gothenburg

Miroslaw Staron

University of Gothenburg

Miroslaw Ochodek

Poznan University of Technology

Regina Hebig

University of Gothenburg

Wilhelm Meding


IEEE Software

0740-7459 (ISSN) 19374194 (eISSN)


International Conference on Software Testing, Verification and Validation Workshops (ICSTW)
Porto, Portugal,

Subject Categories

Software Engineering

Information Science

Computer Science



More information

Latest update