Markov Game Video Augmentation for Action Segmentation

ICCV 2023  ·  Nicolas Aziere, Sinisa Todorovic ·

This paper addresses data augmentation for action segmentation. Our key novelty is that we augment the original training videos in the deep feature space, not in the visual spatiotemporal domain as done by previous work. For augmentation, we modify original deep features of video frames such that the resulting embeddings fall closer to the class decision boundaries. Also, we edit action sequences of the original training videos (a.k.a. transcripts) by inserting, deleting, and replacing actions such that the resulting transcripts are close in edit distance to the ground truth ones. For our data augmentation we resort to reinforcement learning, instead of more common supervised learning, since we do not have access to reliable oracles which would provide supervision about the optimal data modifications in the deep feature space. For modifying frame embeddings, we use a meta-model formulated as a Markov Game with multiple self-interested agents. Also, new transcripts are generated using a fast, parameter-free Monte Carlo tree search. Our experiments show that the proposed data augmentation of the Breakfast, GTEA, and 50Salads datasets leads to significant performance gains of several state of the art action segmenters.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here