TY - JOUR
T1 - Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated with Targeted Email Attacks
AU - Sun, Bo
AU - Ban, Tao
AU - Han, Chansu
AU - Takahashi, Takeshi
AU - Yoshioka, Katsunari
AU - Takeuchi, Jun'ichi
AU - Sarrafzadeh, Abdolhossein
AU - Qiu, Meikang
AU - Inoue, Daisuke
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.
AB - Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.
KW - Targeted email attack
KW - decoy document
KW - machine learning
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85107177958&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107177958&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3082000
DO - 10.1109/ACCESS.2021.3082000
M3 - Article
AN - SCOPUS:85107177958
SN - 2169-3536
VL - 9
SP - 87962
EP - 87971
JO - IEEE Access
JF - IEEE Access
M1 - 9435284
ER -