TY - JOUR
T1 - Ensemble anomaly detection from multi-resolution trajectory features
AU - Ando, Shin
AU - Thanomphongphan, Theerasak
AU - Seki, Yoichi
AU - Suzuki, Einoshin
N1 - Funding Information:
The authors would like to thank the handling editor and the anonymous reviewers for their valuable comments and suggestions. Parts of this study are supported by the Strategic International Cooperative Program funded by Japan Science and Technology Agency and the Grant-in-Aid for Scientific Research on Fundamental Research (B) 21300053, (B) 25280085, and (C) 22500119 by the Japanese Ministry of Education, Culture, Sports, Science and Technology.
Publisher Copyright:
© 2013, The Author(s).
PY - 2013/1
Y1 - 2013/1
N2 - The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.
AB - The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.
UR - http://www.scopus.com/inward/record.url?scp=84921067372&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84921067372&partnerID=8YFLogxK
U2 - 10.1007/s10618-013-0334-x
DO - 10.1007/s10618-013-0334-x
M3 - Article
AN - SCOPUS:84921067372
SN - 1384-5810
VL - 29
SP - 39
EP - 83
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 1
ER -