Search Results for author: Lluis Castrejon

Found 13 papers, 4 papers with code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code implementations • 8 Apr 2024 • Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Optical Character Recognition (OCR) Question Answering +1

Paper
Add Code

How (not) to ensemble LVLMs for VQA

no code implementations • 10 Oct 2023 • Lisa Alazraki, Lluis Castrejon, Mostafa Dehghani, Fantine Huot, Jasper Uijlings, Thomas Mensink

So it is a trivial exercise to create an ensemble with substantial real gains.

Retrieval Visual Question Answering (VQA)

Paper
Add Code

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

33,150

Paper
Code

Cascaded Video Generation for Videos In-the-Wild

no code implementations • 1st International Workshop on Practical Deep Learning in the Wild, Association for the Advancement of Artificial Intelligence (AAAI) 2022 • Lluis Castrejon, Nicolas Ballas, Aaron Courville

Videos can be created by first outlining a global view of the scene and then adding local details.

Video Generation

Paper
Add Code

INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision

no code implementations • 29 Sep 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville

Each object representation defines a localized neural radiance field that is used to generate 2D views of the scene through a differentiable rendering process.

Ranked #6 on Video Object Tracking on CATER

Object Video Object Tracking +1

Paper
Add Code

Hierarchical Video Generation for Complex Data

no code implementations • 4 Jun 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville

Inspired by this we propose a hierarchical model for video generation which follows a coarse to fine approach.

Video Generation

Paper
Add Code

SSW-GAN: Scalable Stage-wise Training of Video GANs

no code implementations • 1 Jan 2021 • Lluis Castrejon, Nicolas Ballas, Aaron Courville

Current state-of-the-art generative models for videos have high computational requirements that impede high resolution generations beyond a few frames.

Paper
Add Code

Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

2 code implementations • 18 Jun 2020 • Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.

Contrastive Learning

486

Paper
Code

Improved Conditional VRNNs for Video Prediction

1 code implementation • ICCV 2019 • Lluis Castrejon, Nicolas Ballas, Aaron Courville

To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models.

Ranked #2 on Video Prediction on Cityscapes 128x128

Video Generation Video Prediction

Paper
Code

MovieGraphs: Towards Understanding Human-Centric Situations from Videos

no code implementations • CVPR 2018 • Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed, graph-based annotations of social situations depicted in movie clips.

Common Sense Reasoning

Paper
Add Code

Annotating Object Instances with a Polygon-RNN

2 code implementations • CVPR 2017 • Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, Sanja Fidler

We show that our approach speeds up the annotation process by a factor of 4. 7 across all classes in Cityscapes, while achieving 78. 4% agreement in IoU with original ground-truth, matching the typical agreement between human annotators.

Object Segmentation +1

702

Paper
Code

Cross-Modal Scene Networks

no code implementations • 27 Oct 2016 • Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Paper
Add Code

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations • CVPR 2016 • Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.