TY - JOUR
T1 - Automated duplicate bug report detection using multi-factor analysis
AU - Zou, Jie
AU - Xu, Ling
AU - Yang, Mengning
AU - Zhang, Xiaohong
AU - Zeng, Jun
AU - Hirokawa, Sachio
N1 - Funding Information:
The work described in this paper was partially supported by the National Natural Science Foundation of China (Grant no. 91118005, 61173131), Chongqing Graduate Student Research Innovation Project (Grant No. CYS15022), the China Postdoctoral Science Foundation under Grant 2014M560704, the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the Fundamental Research Funds for the Central Universities Project No. 2015CDJXY.
Publisher Copyright:
© 2016 The Institute of Electronics, Information and Communication Engineers.
PY - 2016/7
Y1 - 2016/7
N2 - The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.
AB - The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.
UR - http://www.scopus.com/inward/record.url?scp=84976906515&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976906515&partnerID=8YFLogxK
U2 - 10.1587/transinf.2016EDP7052
DO - 10.1587/transinf.2016EDP7052
M3 - Article
AN - SCOPUS:84976906515
SN - 0916-8532
VL - E99D
SP - 1762
EP - 1775
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 7
ER -