Learning Shape-Motion Representations from Geometric Algebra Spatio-Temporal Model for Skeleton-Based Action Recognition

Skeleton-based action recognition has been widely applied in intelligent video surveillance and human behavior analysis. Previous works have successfully applied Convolutional Neural Networks (CNN) to learn spatio-temporal characteristics of the skeleton sequence. However, they merely focus on the coordinates of isolated joints, which ignore the spatial relationships between joints and only implicitly learn the motion representations. To solve these problems, we propose an effective method to learn comprehensive representations from skeleton sequences by using Geometric Algebra. Firstly, a frontal orientation based spatio-temporal model is constructed to represent the spatial configuration and temporal dynamics of skeleton sequences, which owns the robustness against view variations. Then the shape-motion representations which mutually compensate are learned to describe skeleton actions comprehensively. Finally, a multi-stream CNN model is applied to extract and fuse deep features from the complementary shape-motion representations. Experimental results on NTU RGB+D and Northwestern-UCLA datasets consistently verify the superiority of our method.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Skeleton Based Action Recognition NTU RGB+D FO-GASTM Accuracy (CV) 90.05 # 81
Accuracy (CS) 82.83 # 88

Methods


No methods listed for this paper. Add relevant methods here