no code implementations • 8 Apr 2024 • Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings
We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.
no code implementations • 10 Oct 2023 • Lisa Alazraki, Lluis Castrejon, Mostafa Dehghani, Fantine Huot, Jasper Uijlings, Thomas Mensink
So it is a trivial exercise to create an ensemble with substantial real gains.
1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari
Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.
no code implementations • 1st International Workshop on Practical Deep Learning in the Wild, Association for the Advancement of Artificial Intelligence (AAAI) 2022 • Lluis Castrejon, Nicolas Ballas, Aaron Courville
Videos can be created by first outlining a global view of the scene and then adding local details.
no code implementations • 29 Sep 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville
Each object representation defines a localized neural radiance field that is used to generate 2D views of the scene through a differentiable rendering process.
Ranked #6 on Video Object Tracking on CATER
no code implementations • 4 Jun 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville
Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach.
no code implementations • 1 Jan 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville
Current state-of-the-art generative models for videos have high computational requirements that impede high resolution generations beyond a few frames.
2 code implementations • 18 Jun 2020 • Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat
We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.
1 code implementation • ICCV 2019 • Lluis Castrejon, Nicolas Ballas, Aaron Courville
To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models.
Ranked #2 on Video Prediction on Cityscapes 128x128
no code implementations • CVPR 2018 • Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler
Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.
2 code implementations • CVPR 2017 • Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler
We show that our approach speeds up the annotation process by a factor of 4. 7 across all classes in Cityscapes, while achieving 78. 4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators.
no code implementations • 27 Oct 2016 • Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.
no code implementations • CVPR 2016 • Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.