Using Machine Learning to Design a Flexible LOC Counter
Paper in proceeding, 2017

The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew's Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further.

software size estimation

Author

M. Ochodek

Poznan University of Technology

Miroslaw Staron

University of Gothenburg

D. Bargowski

Poznan University of Technology

Wilhelm Meding

Ericsson

Regina Hebig

University of Gothenburg

2017 Ieee International Workshop on Machine Learning Techniques for Software Quality Evaluation (Maltesque)

14-20
978-1-5090-6597-4 (ISBN)

Subject Categories (SSIF 2011)

Language Technology (Computational Linguistics)

DOI

10.1109/MALTESQUE.2017.7882011

More information

Latest update

8/12/2022