STemGAN: spatio-temporal generative adversarial network for video anomaly detection

Automatic detection and interpretation of abnormal events have become crucial tasks in large-scale video surveillance systems. The challenges arise from the lack of a clear definition of abnormality, which restricts the usage of supervised methods. To this end, we propose a novel unsupervised anomaly detection method, Spatio-Temporal Generative Adversarial Network (STemGAN). This framework consists of a generator and discriminator that learns from the video context, utilizing both spatial and temporal information to predict future frames. The generator follows an Autoencoder (AE) architecture, having a dual-stream encoder for extracting appearance and motion information, and a decoder having a Channel Attention (CA) module to focus on dynamic foreground features. In addition, we provide a transfer-learning method that enhances the generalizability of STemGAN. We use benchmark Anomaly Detection (AD) datasets to compare the performance of our approach with the existing state-of-the-art approaches using standard evaluation metrics, i.e., AUC (Area Under Curve) and EER (Equal Error Rate). The empirical results show that our proposed STemGAN outperforms the existing state-of-the-art methods achieving an AUC score of 97.5% on UCSDPed2, 86.0% on CUHK Avenue, 90.4% on Subway-entrance, and 95.2% on Subway-exit.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Anomaly Detection CUHK Avenue STemGAN AUC 86.0 # 24
Anomaly Detection UCSD Ped2 STemGAN AUC 97.5 # 5

Methods