TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Action Recognition	Charades	MAXI	mAP	23.8	# 3
Zero-Shot Action Recognition	HMDB51	MAXI	Top-1 Accuracy	52.3	# 9
Zero-Shot Action Recognition	Kinetics	MAXI	Top-1 Accuracy	71.6	# 3
Zero-Shot Action Recognition	UCF101	MAXI	Top-1 Accuracy	78.2	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/match-expand-and-improve-unsupervised/zero-shot-action-recognition-on-charades-1)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-charades-1?p=match-expand-and-improve-unsupervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/match-expand-and-improve-unsupervised/zero-shot-action-recognition-on-kinetics)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-kinetics?p=match-expand-and-improve-unsupervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/match-expand-and-improve-unsupervised/zero-shot-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-hmdb51?p=match-expand-and-improve-unsupervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/match-expand-and-improve-unsupervised/zero-shot-action-recognition-on-ucf101)](https://paperswithcode.com/sota/zero-shot-action-recognition-on-ucf101?p=match-expand-and-improve-unsupervised)`

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

ICCV 2023 · Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof ·

Large scale Vision-Language (VL) models have shown tremendous success in aligning representations between visual and text modalities. This enables remarkable progress in zero-shot recognition, image generation & editing, and many other exciting tasks. However, VL models tend to over-represent objects while paying much less attention to verbs, and require additional tuning on video data for best zero-shot action recognition performance. While previous work relied on large-scale, fully-annotated data, in this work we propose an unsupervised approach. We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary. Based on that, we leverage Large Language Models and VL models to build a text bag for each unlabeled video via matching, text expansion and captioning. We use those bags in a Multiple Instance Learning setup to adapt an image-text backbone to video data. Although finetuned on unlabeled video data, our resulting models demonstrate high transferability to numerous unseen zero-shot downstream tasks, improving the base VL model performance by up to 14\%, and even comparing favorably to fully-supervised baselines in both zero-shot and few-shot video recognition transfer. The code will be released later at \url{https://github.com/wlin-at/MAXI}.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

wlin-at/maxi official

Tasks

Add Remove

Action Recognition

Few-Shot action recognition

Few Shot Action Recognition

Image Generation

Multiple Instance Learning

Video Recognition

Zero-Shot Action Recognition

Zero-Shot Learning

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

Charades

WebVid

MiT

Results from the Paper

Edit

Ranked #3 on Zero-Shot Action Recognition on Kinetics

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Action Recognition	Charades	MAXI	mAP	23.8	# 3	Compare
Zero-Shot Action Recognition	HMDB51	MAXI	Top-1 Accuracy	52.3	# 9	Compare
Zero-Shot Action Recognition	Kinetics	MAXI	Top-1 Accuracy	71.6	# 3	Compare
Zero-Shot Action Recognition	UCF101	MAXI	Top-1 Accuracy	78.2	# 10	Compare

Methods

Add Remove

BASE

Edit Social Preview

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove