Analyzing defect inflow distribution and applying Bayesian inference method for software defect prediction in large software projects
Journal article, 2016
Tracking and predicting quality and reliability is a major challenge in large and distributed software development projects. A number of standard distributions have been successfully used in reliability engineering theory and practice, common among these for modeling software defect inflow being exponential, Weibull, beta and Non-Homogeneous Poisson Process (NHPP). Although standard distribution models have been recognized in reliability engineering practice, their ability to fit defect data from proprietary and OSS software projects is not well understood. Lack of knowledge about underlying defect inflow distribution also leads to difficulty in applying Bayesian based inference methods for software defect prediction. In this paper we explore the defect inflow distribution of total of fourteen large software projects/release from two industrial domain and open source community. We evaluate six standard distributions for their ability to fit the defect inflow data and also assess which information criterion is practical for selecting the distribution with best fit. Our results show that beta distribution provides the best fit to the defect inflow data for all industrial projects as well as majority of OSS projects studied. In the paper we also evaluate how information about defect inflow distribution from historical projects is applied for modeling the prior beliefs/experience in Bayesian analysis which is useful for making software defect predictions early during the software project lifecycle.