Cross-Modal Self-Supervised Feature Extraction for Anomaly Detection in Human Monitoring

Jose Alejandro Avellaneda, Tetsu Matsukawa, Einoshin Suzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper proposes to extract cross-modal self-supervised features to detect anomalies in human monitoring. Our previous works that use deep captioning in addition to monitoring images were successful. However, their use of unimodally trained image and text features shows deficiencies in capturing contextual information across the modalities. We devise a self-supervised method that creates cross-modal features by maximizing the mutual information between both modalities in a common subspace. It allows capturing different complex distributions between modalities, improving the detection performance of clustering methods. Extensive experimental results show improvements in both AUC and AUPRC scores when compared to the best baselines on two real-world datasets. The AUC has improved from 0.895 to 0.969, and from 0.97 to 0.98. The AUPRC has improved from 0.681 to 0.850, and from 0.840 to 0.894.

Original languageEnglish
Title of host publication2023 IEEE 19th International Conference on Automation Science and Engineering, CASE 2023
PublisherIEEE Computer Society
ISBN (Electronic)9798350320695
DOIs
Publication statusPublished - 2023
Event19th IEEE International Conference on Automation Science and Engineering, CASE 2023 - Auckland, New Zealand
Duration: Aug 26 2023Aug 30 2023

Publication series

NameIEEE International Conference on Automation Science and Engineering
Volume2023-August
ISSN (Print)2161-8070
ISSN (Electronic)2161-8089

Conference

Conference19th IEEE International Conference on Automation Science and Engineering, CASE 2023
Country/TerritoryNew Zealand
CityAuckland
Period8/26/238/30/23

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Cross-Modal Self-Supervised Feature Extraction for Anomaly Detection in Human Monitoring'. Together they form a unique fingerprint.

Cite this