Cross-project Defect Prediction via ASTToken2Vec and BLSTM-based Neural Network

Hao Li, Xiaohong Li, Xiang Chen, Xiaofei Xie, Yanzhou Mu, Zhiyong Feng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)


Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST's natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC.

Original languageEnglish
Title of host publication2019 International Joint Conference on Neural Networks, IJCNN 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728119854
Publication statusPublished - Jul 2019
Externally publishedYes
Event2019 International Joint Conference on Neural Networks, IJCNN 2019 - Budapest, Hungary
Duration: Jul 14 2019Jul 19 2019

Publication series

NameProceedings of the International Joint Conference on Neural Networks


Conference2019 International Joint Conference on Neural Networks, IJCNN 2019

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence


Dive into the research topics of 'Cross-project Defect Prediction via ASTToken2Vec and BLSTM-based Neural Network'. Together they form a unique fingerprint.

Cite this