Selective Regression Testing based on Big Data: Comparing Feature Extraction Techniques
Paper i proceeding, 2020

Regression testing is a necessary activity in continuous integration (CI) since it provides confidence that modified parts of the system are correct at each integration cycle. CI provides large volumes of data which can be used to support regression testing activities. By using machine learning, patterns about faulty changes in the modified program can be induced, allowing test orchestrators to make inferences about test cases that need to be executed at each CI cycle. However, one challenge in using learning models lies in finding a suitable way for characterizing source code changes and preserving important information. In this paper, we empirically evaluate the effect of three feature extraction algorithms on the performance of an existing ML-based selective regression testing technique. We designed and performed an experiment to empirically investigate the effect of Bag of Words (BoW), Word Embeddings (WE), and content-based feature extraction (CBF). We used stratified cross validation on the space of features generated by the three FE techniques and evaluated the performance of three machine learning models using the precision and recall metrics. The results from this experiment showed a significant difference between the models' precision and recall scores, suggesting that the BoW-fed model outperforms the other two models with respect to precision, whereas a CBF-fed model outperforms the rest with respect to recall.

Machine Learning

Feature Extraction

Regression Testing

Continuous Integration

Författare

Khaled Al Sabbagh

Göteborgs universitet

Miroslaw Staron

Göteborgs universitet

Miroslaw Ochodek

Politechnika Poznanska

Regina Hebig

Göteborgs universitet

Wilhelm Meding

Ericsson AB

IEEE Software

0740-7459 (ISSN) 19374194 (eISSN)

322-329

International Conference on Software Testing, Verification and Validation Workshops (ICSTW)
Porto, Portugal,

Ämneskategorier

Programvaruteknik

Systemvetenskap

Datavetenskap (datalogi)

DOI

10.1109/ICSTW50294.2020.00058

Mer information

Senast uppdaterat

2022-04-21