Search Results for author: Cong Wei

Found 9 papers, 4 papers with code

MANTIS: Interleaved Multi-Image Instruction Tuning

no code implementations • 2 May 2024 • Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

We evaluate the trained Mantis on five multi-image benchmarks and eight single-image benchmarks.

Paper
Add Code

LaSagnA: Language-based Segmentation Assistant for Complex Queries

2 code implementations • 12 Apr 2024 • Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.

Segmentation Semantic Segmentation

229

Paper
Code

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

no code implementations • 21 Mar 2024 • Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

In the second stage, AnyV2V can plug in any existing image-to-video models to perform DDIM inversion and intermediate feature injection to maintain the appearance and motion consistency with the source video.

Image to Video Generation Style Transfer +1

Paper
Add Code

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

1 code implementation • 6 Feb 2024 • Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen

To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation.

Image to Video Generation

155

Paper
Code

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

no code implementations • 22 Dec 2023 • Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0. 3 with human evaluations, while the human-to-human correlation is 0. 45.

Conditional Image Generation General Knowledge

Paper
Add Code

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

no code implementations • 28 Nov 2023 • Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image.

Benchmarking Information Retrieval +2

Paper
Add Code

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

2 code implementations • 27 Nov 2023 • Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.

Complex Query Answering Logical Reasoning +1

7,242

Paper
Code

DreamEdit: Subject-driven Image Editing

no code implementations • 22 Jun 2023 • Tianle Li, Max Ku, Cong Wei, Wenhu Chen

In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i. e., Subject Replacement and Subject Addition.

Image Generation Position

Paper
Add Code

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

1 code implementation • CVPR 2023 • Cong Wei, Brendan Duke, Ruowei Jiang, Parham Aarabi, Graham W. Taylor, Florian Shkurti

Equipped with the learned unstructured attention pattern, sparse attention ViT (Sparsifiner) produces a superior Pareto-optimal trade-off between FLOPs and top-1 accuracy on ImageNet compared to token sparsity.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.