Search Results for author: Brandon McKinzie

Found 4 papers, 1 papers with code

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

Paper
Add Code

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

no code implementations • 27 Nov 2023 • Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev

Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling.

Language Modelling Text-to-Image Generation

Paper
Add Code

On Robustness in Multimodal Learning

no code implementations • 10 Apr 2023 • Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text.

Representation Learning

Paper
Add Code

Perceptual Grouping in Contrastive Vision-Language Models

1 code implementation • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens

In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on MS COCO

Object Localization Representation Learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.