Search Results for author: Howard Zhou

Found 8 papers, 3 papers with code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code implementations • 8 Apr 2024 • Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Optical Character Recognition (OCR) Question Answering +1

Paper
Add Code

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

no code implementations • 5 Mar 2024 • Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.

Image Classification Question Answering +2

Paper
Add Code

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation • ICCV 2023 • Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

32,994

Paper
Code

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

no code implementations • 22 Mar 2023 • Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, André Araujo

Finding localized correspondences across different images of the same object is crucial to understand its geometry.

Object

Paper
Add Code

IBRNet: Learning Multi-View Image-Based Rendering

1 code implementation • CVPR 2021 • Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser

Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.

Neural Rendering Novel View Synthesis

481

Paper
Code

Unifying Specialist Image Embedding into Universal Image Embedding

no code implementations • 8 Mar 2020 • Yang Feng, Futang Peng, Xu Zhang, Wei Zhu, Shanfeng Zhang, Howard Zhou, Zhen Li, Tom Duerig, Shih-Fu Chang, Jiebo Luo

Therefore, we propose to distill the knowledge in multiple specialists into a universal embedding to solve this problem.

Face Verification Image Retrieval +3

Paper
Add Code

Blockout: Dynamic Model Selection for Hierarchical Deep Networks

no code implementations • CVPR 2016 • Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

Most deep architectures for image classification--even those that are trained to classify a large number of diverse categories--learn shared image representations with a single model.

Clustering General Classification +2

Paper
Add Code

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

1 code implementation • 20 Nov 2015 • Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.

Ranked #5 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Active Learning Fine-Grained Image Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.