no code implementations • 8 Apr 2024 • Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings
We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.
no code implementations • 5 Mar 2024 • Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig
Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.
1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari
Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.
no code implementations • 22 Mar 2023 • Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, André Araujo
Finding localized correspondences across different images of the same object is crucial to understand its geometry.
1 code implementation • CVPR 2021 • Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.
no code implementations • 8 Mar 2020 • Yang Feng, Futang Peng, Xu Zhang, Wei Zhu, Shanfeng Zhang, Howard Zhou, Zhen Li, Tom Duerig, Shih-Fu Chang, Jiebo Luo
Therefore, we propose to distill the knowledge in multiple specialists into a universal embedding to solve this problem.
no code implementations • CVPR 2016 • Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig
Most deep architectures for image classification--even those that are trained to classify a large number of diverse categories--learn shared image representations with a single model.
1 code implementation • 20 Nov 2015 • Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei
Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.
Ranked #5 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)