Search Results for author: Bohan Zhai

Found 9 papers, 5 papers with code

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

no code implementations • 3 Mar 2024 • Haogeng Liu, Quanzeng You, Xiaotian Han, Yiqi Wang, Bohan Zhai, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

Multimodal Large Language Models (MLLMs) have experienced significant advancements recently.

Ranked #41 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

no code implementations • 17 Jan 2024 • Xiaotian Han, Yiqi Wang, Bohan Zhai, Quanzeng You, Hongxia Yang

We argue that datasets with diverse and high-quality detailed instruction following annotations are essential and adequate for MLLMs IFT.

Ranked #47 on Visual Question Answering on MM-Vet

Image Captioning Instruction Following +1

Paper
Add Code

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

no code implementations • 10 Jan 2024 • Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions.

Multimodal Reasoning

Paper
Add Code

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations • 20 Nov 2023 • Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

Paper
Add Code

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

2 code implementations • 3 Oct 2023 • Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li

Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning.

Attribute Decoder +3

Paper
Code

Multitask Vision-Language Prompt Tuning

1 code implementation • 21 Nov 2022 • Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Visual Prompt Tuning

Paper
Code

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

1 code implementation • 8 Jun 2021 • Chenfeng Xu, Shijia Yang, Tomer Galanti, Bichen Wu, Xiangyu Yue, Bohan Zhai, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

We discover that we can indeed use the same architecture and pretrained weights of a neural net model to understand both images and point-clouds.

3D Point Cloud Classification Point Cloud Classification +1

116

Paper
Code

Integer-only Zero-shot Quantization for Efficient Speech Recognition

1 code implementation • 31 Mar 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

1 code implementation • 16 Jan 2020 • Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech.

Sound Audio and Speech Processing

255

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.