TY - GEN
T1 - Audee
T2 - 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
AU - Guo, Qianyu
AU - Xie, Xiaofei
AU - Li, Yi
AU - Zhang, Xiaoyu
AU - Liu, Yang
AU - Li, Xiaohong
AU - Shen, Chao
N1 - Funding Information:
We thank the anonymous reviewers for their comprehensive feedback. This work was partly supported by the National Science Foundation of China (No. 61872262, 61572349). It was also sponsored by the Singapore Ministry of Education Academic Research Fund Tier 1 (Award No. 2018-T1-002-069), the National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity R&D Program (Award No. NRF2018NCR-NCR005-0001), the Singapore National Research Foundation under NCR Award Number NSOE003-0001 and NRF Investigatorship NRFI06-2020-0022. We also gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research.
Funding Information:
We thank the anonymous reviewers for their comprehensive feedback. This work was partly supported by the National Science Foundation of China (No. 61872262, 61572349). It was also sponsored by the Singapore Ministry of Education Academic Research Fund Tier 1 (Award No. 2018-T1-002-069), the National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity R&D Program (Award No. NRF2018NCR-NCR005-0001), the Singapore National Research Foundation under NCR Award Number NSOE003-0001 and NRF Investigatorship NRFI06-2020-0022. We also gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research
Publisher Copyright:
© 2020 ACM.
PY - 2020/9
Y1 - 2020/9
N2 - Deep learning (DL) has been applied widely, and the quality of DL system becomes crucial, especially for safety-critical applications. Existing work mainly focuses on the quality analysis of DL models, but lacks attention to the underlying frameworks on which all DL models depend. In this work, we propose Audee, a novel approach for testing DL frameworks and localizing bugs. Audee adopts a search-based approach and implements three different mutation strategies to generate diverse test cases by exploring combinations of model structures, parameters, weights and inputs. Audee is able to detect three types of bugs: logical bugs, crashes and Not-a-Number (NaN) errors. In particular, for logical bugs, Audee adopts a cross-reference check to detect behavioural inconsistencies across multiple frameworks (e.g., TensorFlow and PyTorch), which may indicate potential bugs in their implementations. For NaN errors, Audee adopts a heuristic-based approach to generate DNNs that tend to output outliers (i.e., too large or small values), and these values are likely to produce NaN. Furthermore, Audee leverages a causal-testing based technique to localize layers as well as parameters that cause inconsistencies or bugs. To evaluate the effectiveness of our approach, we applied Audee on testing four DL frameworks, i.e., TensorFlow, PyTorch, CNTK, and Theano. We generate a large number of DNNs which cover 25 widely-used APIs in the four frameworks. The results demonstrate that Audee is effective in detecting inconsistencies, crashes and NaN errors. Intotal, 26 unique unknown bugs were discovered, and 7 of them have already been confirmed or fixed by the developers.
AB - Deep learning (DL) has been applied widely, and the quality of DL system becomes crucial, especially for safety-critical applications. Existing work mainly focuses on the quality analysis of DL models, but lacks attention to the underlying frameworks on which all DL models depend. In this work, we propose Audee, a novel approach for testing DL frameworks and localizing bugs. Audee adopts a search-based approach and implements three different mutation strategies to generate diverse test cases by exploring combinations of model structures, parameters, weights and inputs. Audee is able to detect three types of bugs: logical bugs, crashes and Not-a-Number (NaN) errors. In particular, for logical bugs, Audee adopts a cross-reference check to detect behavioural inconsistencies across multiple frameworks (e.g., TensorFlow and PyTorch), which may indicate potential bugs in their implementations. For NaN errors, Audee adopts a heuristic-based approach to generate DNNs that tend to output outliers (i.e., too large or small values), and these values are likely to produce NaN. Furthermore, Audee leverages a causal-testing based technique to localize layers as well as parameters that cause inconsistencies or bugs. To evaluate the effectiveness of our approach, we applied Audee on testing four DL frameworks, i.e., TensorFlow, PyTorch, CNTK, and Theano. We generate a large number of DNNs which cover 25 widely-used APIs in the four frameworks. The results demonstrate that Audee is effective in detecting inconsistencies, crashes and NaN errors. Intotal, 26 unique unknown bugs were discovered, and 7 of them have already been confirmed or fixed by the developers.
UR - http://www.scopus.com/inward/record.url?scp=85099256881&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099256881&partnerID=8YFLogxK
U2 - 10.1145/3324884.3416571
DO - 10.1145/3324884.3416571
M3 - Conference contribution
AN - SCOPUS:85099256881
T3 - Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
SP - 486
EP - 498
BT - Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 September 2020 through 25 September 2020
ER -