Zero-Shot Image Classification
43 papers with code • 3 benchmarks • 4 datasets
Libraries
Use these libraries to find Zero-Shot Image Classification models and implementationsMost implemented papers
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.
LiT: Zero-Shot Transfer with Locked-image text Tuning
This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.
Reproducible scaling laws for contrastive language-image learning
To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.
What does a platypus look like? Generating customized prompts for zero-shot image classification
Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference.
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
However, the compositional reasoning abilities of existing VLMs remains subpar.
Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs).