Correlations of software code metrics: An empirical study
Paper in proceeding, 2017
Background: The increasing up-trend of software size brings about challenges related to release planning and maintainability. Foreseeing the growth of software metrics can assist in taking proactive decisions regarding different areas where software metrics play vital roles. For example, source code metrics are used to automatically calculate technical debt related to code quality which may indicate how maintainable a software is. Thus, predicting such metrics can give us an indication of technical debt in the future releases of software. Objective: Estimation or prediction of software metrics can be performed more meaningfully if the relationships between different domains of metrics and relationships between the metrics and different domains are well understood. To understand such relationships, this empirical study has collected 25 metrics classified into four domains from 9572 software revisions of 20 open source projects from 8 well-known companies. Results: We found software size related metrics are most correlated among themselves and with metrics from other domains. Complexity and documentation related metrics are more correlated with size metrics than themselves. Metrics in the duplications domain are observed to be more correlated to themselves on a domain-level. However, a metric to domain level relationship exploration reveals that metrics with most strong correlations are in fact connected to size metrics. The Overall correlation ranking of duplication metrics are least among all domains and metrics. Contribution: Knowledge earned from this research will help to understand inherent relationships between metrics and domains. This knowledge together with metric-level relationships will allow building better predictive models for software code metrics.
Software Code Metrics
Correlation of Metrics
Spearman’s Rank Correlation
Software Engineering