Selective Regression Testing based on Big Data: Comparing Feature Extraction Techniques
Paper i proceeding, 2020
Regression testing is a necessary activity in continuous integration (CI) since it provides confidence that modified parts of the system are correct at each integration cycle. CI provides large volumes of data which can be used to support regression testing activities. By using machine learning, patterns about faulty changes in the modified program can be induced, allowing test orchestrators to make inferences about test cases that need to be executed at each CI cycle. However, one challenge in using learning models lies in finding a suitable way for characterizing source code changes and preserving important information. In this paper, we empirically evaluate the effect of three feature extraction algorithms on the performance of an existing ML-based selective regression testing technique. We designed and performed an experiment to empirically investigate the effect of Bag of Words (BoW), Word Embeddings (WE), and content-based feature extraction (CBF). We used stratified cross validation on the space of features generated by the three FE techniques and evaluated the performance of three machine learning models using the precision and recall metrics. The results from this experiment showed a significant difference between the models' precision and recall scores, suggesting that the BoW-fed model outperforms the other two models with respect to precision, whereas a CBF-fed model outperforms the rest with respect to recall.