Mining Task-Specific Lines of Code Counters
Journal article, 2023
Method: We use Design Science Research as our research methodology to build and validate a generator of task-specific LOC measures and their counters. The generated LOC counters have a form of binary decision trees inferred from historical data using Genetic Programming. The proposed tool was validated based on three tasks, i.e., mining LOC measures to proxy for code readability, number of assertions in unit tests, and code-review duration. Results: Task-specific LOC measures showed a "strong" to "very strong" negative correlation with code-readability score (Kendall's $\tau $ ranging from -0.83 to -0.76) compared to "weak" to "strong" negative correlation for the best among the standard LOC measures ( $\tau $ ranging from -0.36 to -0.13). For the problem of proxying for the number of assertions in unit tests, correlation coefficients were also higher for task-specific LOC measures by ca. 11% to 21% ( $\tau $ ranged from 0.31 to 0.34). Finally, task-specific LOC measures showed a stronger correlation with code-review duration than the best among the standard LOC measures ( $\tau $ = 0.31, 0.36, and 0.37 compared to 0.11, 0.08, 0.16, respectively).
Conclusions: Our study shows that it is possible to mine task-specific LOC counters from historical datasets using Genetic Programming. Task-specific LOC measures obtained that way show stronger correlations with the variables they proxy for than the standard LOC measures.
Author
Miroslaw ochodek
Poznan University of Technology
Krzysztof Durczak
Poznan University of Technology
Jerzy Nawrocki
Poznan University of Technology
Miroslaw Staron
University of Gothenburg
Chalmers, Computer Science and Engineering (Chalmers), Software Engineering (Chalmers)
IEEE Access
2169-3536 (ISSN) 21693536 (eISSN)
Vol. 11 100218-100233Subject Categories (SSIF 2025)
Software Engineering
Computer Sciences
DOI
10.1109/ACCESS.2023.3314572