Zero-Shot Image Classification

43 papers with code • 3 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Image Classification

Dataset	Best Model	Compare
ICinW	CLIP (ViT B-32)	See all
ODinW	GLIP (Tiny A)	See all
Country211	OpenClip H/14 (34B)(Laion2B)	See all

Libraries

Use these libraries to find Zero-Shot Image Classification models and implementations

google-research/big_vision

3 papers

1,710

mlfoundations/open_clip

2 papers

8,617

Datasets

Most implemented papers

Most implemented Social Latest No code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

computer-vision-in-the-wild/cvinw_readings • 19 Apr 2022

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Paper
Code

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

kakaobrain/coyo-dataset • • 11 Feb 2021

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Paper
Code

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

tensorflow/tpu • • ICLR 2022

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

Paper
Code

LiT: Zero-Shot Transfer with Locked-image text Tuning

google-research/vision_transformer • • CVPR 2022

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Paper
Code

Reproducible scaling laws for contrastive language-image learning

laion-ai/scaling-laws-openclip • • CVPR 2023

To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.

Paper
Code

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

mendelxu/zsseg.baseline • • 29 Dec 2021

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Paper
Code

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

zjukg/DUET • • 4 Jul 2022

Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

Paper
Code

What does a platypus look like? Generating customized prompts for zero-shot image classification

sarahpratt/cupl • • ICCV 2023

Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference.

Paper
Code