no code implementations • 8 Apr 2024 • Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan
For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks.
no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang
Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.
Ranked #21 on Visual Question Answering on MM-Vet
no code implementations • 1 Nov 2023 • Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44. 1kHz stereo audio with sampling-time guidance.
no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du
We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.
no code implementations • CVPR 2023 • Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter
Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks.
1 code implementation • 16 Feb 2022 • Frank Stapel, Floris Weers, Doina Bucur
We measure the color shifts present in colorized images from the ADE20K dataset, when colorized by the automatic GAN-based DeOldify model.