TY - GEN
T1 - An empirical study of just-in-time defect prediction using cross-project models
AU - Fukushima, Takafumi
AU - Kamei, Yasutaka
AU - McIntosh, Shane
AU - Yamashita, Kazuhiro
AU - Ubayashi, Naoyasu
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/5/31
Y1 - 2014/5/31
N2 - Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.
AB - Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.
UR - http://www.scopus.com/inward/record.url?scp=84938794114&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938794114&partnerID=8YFLogxK
U2 - 10.1145/2597073.2597075
DO - 10.1145/2597073.2597075
M3 - Conference contribution
AN - SCOPUS:84938794114
T3 - 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings
SP - 172
EP - 181
BT - 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings
PB - Association for Computing Machinery
T2 - 11th International Working Conference on Mining Software Repositories, MSR 2014
Y2 - 31 May 2014 through 1 June 2014
ER -