TY - GEN
T1 - Cats Are Not Fish
T2 - 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
AU - Berend, David
AU - Xie, Xiaofei
AU - Ma, Lei
AU - Zhou, Lingjun
AU - Liu, Yang
AU - Xu, Chi
AU - Zhao, Jianjun
N1 - Funding Information:
This work was supported by Singapore Ministry of Education Academic Research Fund Tier 1 (Award No. 2018-T1-002-069), the National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity R&D Program (Award No. NRF2018 NCR-NCR005-0001), the Singapore National Research Foundation under NCR Award Number NSOE003-0001 and NRF Investigator-ship NRFI06-2020-0022. It was also supported by JSPS KAKENHI Grant No.20H04168, 19K24348, 19H04086, and JST-Mirai Program Grant No.JPMJMI18BB, Japan. We also gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research.
Publisher Copyright:
© 2020 ACM.
PY - 2020/9
Y1 - 2020/9
N2 - As Deep Learning (DL) is continuously adopted in many industrial applications, its quality and reliability start to raise concerns. Similar to the traditional software development process, testing the DL software to uncover its defects at an early stage is an effective way to reduce risks after deployment. According to the fundamental assumption of deep learning, the DL software does not provide statistical guarantee and has limited capability in handling data that falls outside of its learned distribution, i.e., out-of-distribution (OOD) data. Although recent progress has been made in designing novel testing techniques for DL software, which can detect thousands of errors, the current state-of-the-art DL testing techniques usually do not take the distribution of generated test data into consideration. It is therefore hard to judge whether the 'identified errors' are indeed meaningful errors to the DL application (i.e., due to quality issues of the model) or outliers that cannot be handled by the current model (i.e., due to the lack of training data). Tofill this gap, we take thefi rst step and conduct a large scale empirical study, with a total of 451 experiment configurations, 42 deep neural networks (DNNs) and 1.2 million test data instances, to investigate and characterize the impact of OOD-awareness on DL testing. We further analyze the consequences when DL systems go into production by evaluating the effectiveness of adversarial retraining with distribution-aware errors. The results confirm that introducing data distribution awareness in both testing and enhancement phases outperforms distribution unaware retraining by up to 21.5%.
AB - As Deep Learning (DL) is continuously adopted in many industrial applications, its quality and reliability start to raise concerns. Similar to the traditional software development process, testing the DL software to uncover its defects at an early stage is an effective way to reduce risks after deployment. According to the fundamental assumption of deep learning, the DL software does not provide statistical guarantee and has limited capability in handling data that falls outside of its learned distribution, i.e., out-of-distribution (OOD) data. Although recent progress has been made in designing novel testing techniques for DL software, which can detect thousands of errors, the current state-of-the-art DL testing techniques usually do not take the distribution of generated test data into consideration. It is therefore hard to judge whether the 'identified errors' are indeed meaningful errors to the DL application (i.e., due to quality issues of the model) or outliers that cannot be handled by the current model (i.e., due to the lack of training data). Tofill this gap, we take thefi rst step and conduct a large scale empirical study, with a total of 451 experiment configurations, 42 deep neural networks (DNNs) and 1.2 million test data instances, to investigate and characterize the impact of OOD-awareness on DL testing. We further analyze the consequences when DL systems go into production by evaluating the effectiveness of adversarial retraining with distribution-aware errors. The results confirm that introducing data distribution awareness in both testing and enhancement phases outperforms distribution unaware retraining by up to 21.5%.
UR - http://www.scopus.com/inward/record.url?scp=85098369646&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098369646&partnerID=8YFLogxK
U2 - 10.1145/3324884.3416609
DO - 10.1145/3324884.3416609
M3 - Conference contribution
AN - SCOPUS:85098369646
T3 - Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
SP - 1041
EP - 1052
BT - Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 September 2020 through 25 September 2020
ER -