Search Results for author: Yuechen Zhang

Found 9 papers, 8 papers with code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations • 27 Mar 2024 • Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Ranked #9 on Visual Question Answering on MM-Vet

Image Comprehension Visual Dialog +1

3,027

Paper
Code

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation • 7 Dec 2023 • Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.

Text Generation

104

Paper
Code

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations • 1 Jun 2023 • Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Paper
Add Code

Real-World Image Variation by Aligning Diffusion Inversion Chain

2 code implementations • NeurIPS 2023 • Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.

Image-Variation Semantic Similarity +2

137

Paper
Code

Video-P2P: Video Editing with Cross-attention Control

1 code implementation • 8 Mar 2023 • Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

339

Paper
Code

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

1 code implementation • CVPR 2023 • Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.

Ranked #4 on 3D Face Animation on BEAT2

3D Face Animation regression

477

Paper
Code

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

1 code implementation • CVPR 2023 • Yuechen Zhang, Zexin He, Jinbo Xing, Xufeng Yao, Jiaya Jia

We propose a ray registration process based on the stylized reference view to obtain pseudo-ray supervision in novel views.

Semantic correspondence

119

Paper
Code

PCL: Proxy-Based Contrastive Learning for Domain Generalization

1 code implementation • CVPR 2022 • Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu

Domain generalization refers to the problem of training a model from a collection of different source domains that can directly generalize to the unseen target domains.

Ranked #17 on Domain Generalization on PACS

Contrastive Learning Domain Generalization

Paper
Code

High Quality Segmentation for Ultra High-resolution Images

1 code implementation • CVPR 2022 • Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie, Jianlong Wu, Zhe Lin, Jiaya Jia

To segment 4K or 6K ultra high-resolution images needs extra computation consideration in image segmentation.

4k Image Segmentation +3

674

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.