Search Results for author: Floris Weers

Found 6 papers, 1 papers with code

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

no code implementations • 8 Apr 2024 • Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks.

Instruction Following

Paper
Add Code

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

Paper
Add Code

Controllable Music Production with Diffusion Models and Guidance Gradients

no code implementations • 1 Nov 2023 • Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson

We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44. 1kHz stereo audio with sampling-time guidance.

Paper
Add Code

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.

Paper
Add Code

Masked Autoencoding Does Not Help Natural Language Supervision at Scale

no code implementations • CVPR 2023 • Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks.

Paper
Add Code

Bias in Automated Image Colorization: Metrics and Error Types

1 code implementation • 16 Feb 2022 • Frank Stapel, Floris Weers, Doina Bucur

We measure the color shifts present in colorized images from the ADE20K dataset, when colorized by the automatic GAN-based DeOldify model.

Colorization Image Colorization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.