Search Results for author: Anthony Meng Huat Tiong

Found 6 papers, 4 papers with code

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

1 code implementation • 3 Apr 2024 • Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong

Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate.

Transfer Learning

Paper
Code

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

Ranked #5 on Visual Question Answering on BenchLMM

Video Question Answering visual instruction following +1

8,890

Paper
Code

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi

To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

Paper
Add Code

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi

To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

8,890

Paper
Code

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

2 code implementations • 17 Oct 2022 • Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, Steven C. H. Hoi

Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting.

Ranked #2 on Visual Question Answering (VQA) on VQA v2 val

Image Captioning Network Interpretation +2

8,890

Paper
Code

Improving Tail-Class Representation with Centroid Contrastive Learning

no code implementations • 19 Oct 2021 • Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi

ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the interpolative image can be used to retrieve the centroids for both source classes.

Ranked #22 on Long-tail Learning on CIFAR-10-LT (ρ=10)

Contrastive Learning Image Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.