Search Results for author: Jianmin Bao

Found 53 papers, 37 papers with code

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

1 code implementation • 18 Dec 2023 • Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.

3D Generation Object +1

Paper
Code

Towards More Unified In-context Visual Understanding

no code implementations • 5 Dec 2023 • Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.

Decoder Image Captioning +2

Paper
Add Code

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

no code implementations • 8 Nov 2023 • Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).

Person Re-Identification

Paper
Add Code

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations • 28 Sep 2023 • Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

Paper
Add Code

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

1 code implementation • 7 Sep 2023 • Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.

Keypoint Detection

346

Paper
Code

AltFreezing for More General Video Face Forgery Detection

1 code implementation • CVPR 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li

In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection.

Data Augmentation

Paper
Code

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation • 8 Jun 2023 • Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Image Restoration +2

Paper
Code

Designing a Better Asymmetric VQGAN for StableDiffusion

2 code implementations • 7 Jun 2023 • Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.

Decoder Image Inpainting

988

Paper
Code

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation • NeurIPS 2023 • Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

519

Paper
Code

CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

1 code implementation • CVPR 2023 • Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Wenqiang Zhang

Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e. g., +22. 4 T2V and +28. 0 V2T R@1 improvements on How2Sign dataset, and +13. 7 T2V and +17. 1 V2T R@1 improvements on PHOENIX-2014T dataset.

Ranked #1 on Sign Language Retrieval on CSL-Daily

Contrastive Learning Retrieval +5

215

Paper
Code

DIRE for Diffusion-Generated Image Detection

1 code implementation • ICCV 2023 • Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li

We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data.

209

Paper
Code

Efficient Diffusion Training via Min-SNR Weighting Strategy

2 code implementations • ICCV 2023 • Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo

Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence.

Ranked #1 on Image Generation on ImageNet 256x256

Denoising Image Generation +2

174

Paper
Code

Improving CLIP Fine-tuning Performance

1 code implementation • ICCV 2023 • Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.

object-detection Object Detection +1

221

Paper
Code

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

no code implementations • CVPR 2023 • Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields.

Computational Efficiency

Paper
Add Code

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

1 code implementation • 12 Dec 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.

190

Paper
Code

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

1 code implementation • 7 Dec 2022 • Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Ranked #8 on Instance Segmentation on LVIS v1.0 val

Data Augmentation Instance Segmentation +5

Paper
Code

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

no code implementations • 28 Nov 2022 • YiXuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network.

Attribute Image Generation +1

Paper
Add Code

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation • 22 Nov 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Ranked #1 on Image Generation on Places50

Denoising Image Generation +1

273

Paper
Code

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

no code implementations • CVPR 2023 • Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language.

Representation Learning

Paper
Add Code

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

1 code implementation • 14 Jul 2022 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance.

Ranked #19 on Self-Supervised Image Classification on ImageNet (finetuned)

Decoder Object Detection +2

Paper
Code

Semantic Image Synthesis via Diffusion Models

3 code implementations • 30 Jun 2022 • Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Decoder Denoising +1

203

Paper
Code

I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose Estimation

1 code implementation • 22 Jun 2022 • Yiwei Ding, Wenjin Deng, Yinglin Zheng, PengFei Liu, Meihong Wang, Xuan Cheng, Jianmin Bao, Dong Chen, Ming Zeng

In this paper, we present the Intra- and Inter-Human Relation Networks (I^2R-Net) for Multi-Person Pose Estimation.

Ranked #2 on Multi-Person Pose Estimation on OCHuman

Multi-Person Pose Estimation Relation +1

Paper
Code

Improved Vector Quantized Diffusion Models

1 code implementation • 31 May 2022 • Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

When trained on ImageNet, we dramatically improve the FID score from 11. 89 to 4. 83, demonstrating the superiority of our proposed techniques.

Denoising Image Generation

844

Paper
Code

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

1 code implementation • 27 May 2022 • Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools.

Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)

Contrastive Learning Image Classification +5

221

Paper
Code

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations • CVPR 2022 • Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Ranked #7 on Person Re-Identification on CUHK03

Contrastive Learning Multi-Object Tracking +3

221

Paper
Code

Semi-Supervised Image-to-Image Translation using Latent Space Mapping

no code implementations • 29 Mar 2022 • Pan Zhang, Jianmin Bao, Ting Zhang, Dong Chen, Fang Wen

Thanks to the low dimensional feature space, it is easier to find the desired mapping function, resulting in improved quality of translation results as well as the stability of the translation model.

Image-to-Image Translation Translation

Paper
Add Code

Protecting Celebrities from DeepFake with Identity Consistency Transformer

1 code implementation • CVPR 2022 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions.

Face Swapping

Paper
Code

StyleSwin: Transformer-based GAN for High-resolution Image Generation

1 code implementation • CVPR 2022 • BoWen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo

To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.

Ranked #1 on Image Generation on CelebA 256x256 (FID metric)

Blocking Computational Efficiency +3

481

Paper
Code

General Facial Representation Learning in a Visual-Linguistic Manner

2 code implementations • CVPR 2022 • Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.

Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)

Face Alignment Face Parsing +1

337

Paper
Code

Vector Quantized Diffusion Model for Text-to-Image Synthesis

2 code implementations • CVPR 2022 • Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Denoising Text-to-Image Generation

844

Paper
Code

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

1 code implementation • 24 Nov 2021 • Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

This paper explores a better prediction target for BERT pre-training of vision transformers.

Ranked #4 on Self-Supervised Image Classification on ImageNet (finetuned)

object-detection Object Detection +2

152

Paper
Code

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations • CVPR 2022 • Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Ranked #10 on Self-Supervised Image Classification on ImageNet (finetuned)

Representation Learning Self-Supervised Image Classification +1

876

Paper
Code

Exploring Temporal Coherence for More General Video Face Forgery Detection

1 code implementation • ICCV 2021 • Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen

The first stage is a fully temporal convolution network (FTCN).

Ranked #4 on DeepFake Detection on FakeAVCeleb

DeepFake Detection

Paper
Code

Dual Path Learning for Domain Adaptation of Semantic Segmentation

1 code implementation • ICCV 2021 • Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang

In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary in terms of image translation and SSL, we propose a novel dual path learning (DPL) framework to alleviate visual inconsistency.

Ranked #32 on Synthetic-to-Real Translation on GTAV-to-Cityscapes Labels

Domain Adaptation Segmentation +4

Paper
Code

Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation

no code implementations • ICCV 2021 • Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, Houqiang Li

In this paper, we uncover that the negative examples play a critical role in the performance of contrastive learning for image translation.

Contrastive Learning Image-to-Image Translation +1

Paper
Add Code

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

6 code implementations • CVPR 2022 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo

By further pretraining on the larger dataset ImageNet-21K, we achieve 87. 5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55. 7 mIoU.

Ranked #25 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

5,280

Paper
Code

Uformer: A General U-Shaped Transformer for Image Restoration

4 code implementations • CVPR 2022 • Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.

Ranked #2 on Deblurring on RealBlur-R (trained on GoPro)

Deblurring Decoder +6

739

Paper
Code

High-Fidelity and Arbitrary Face Editing

no code implementations • CVPR 2021 • Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, Zhouhui Lian

However, we observe that the generator tends to find a tricky way to hide information from the original image to satisfy the constraint of cycle consistency, making it impossible to maintain the rich details (e. g., wrinkles and moles) of non-editing areas.

Attribute Vocal Bursts Intensity Prediction

Paper
Add Code

Unsupervised Pre-training for Person Re-identification

1 code implementation • CVPR 2021 • Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

221

Paper
Code

Identity-Driven DeepFake Detection

no code implementations • 7 Dec 2020 • Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Our approach takes as input the suspect image/video as well as the target identity information (a reference image or video).

DeepFake Detection Face Swapping

Paper
Add Code

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

1 code implementation • CVPR 2021 • Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen

We present the full-resolution correspondence learning for cross-domain images, which aids image translation.

Image-to-Image Translation Semantic correspondence +1

335

Paper
Code

Learnable Sampling 3D Convolution for Video Enhancement and Action Recognition

no code implementations • 22 Nov 2020 • Shuyang Gu, Jianmin Bao, Dong Chen

A key challenge in video enhancement and action recognition is to fuse useful information from neighboring frames.

Action Recognition Denoising +3

Paper
Add Code

GreedyFool: Distortion-Aware Sparse Adversarial Attack

1 code implementation • NeurIPS 2020 • Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, Dong Chen

Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels.

Adversarial Attack

Paper
Code

Improving Person Re-identification with Iterative Impression Aggregation

no code implementations • 21 Sep 2020 • Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li

Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.

Person Re-Identification Re-Ranking

Paper
Add Code

PriorGAN: Real Data Prior for Generative Adversarial Nets

1 code implementation • 30 Jun 2020 • Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

To address these two issues, we propose a novel prior that captures the whole real data distribution for GANs, which are called PriorGANs.

213

Paper
Code

GIQA: Generated Image Quality Assessment

1 code implementation • ECCV 2020 • Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect.

Image Quality Assessment

213

Paper
Code

FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

10 code implementations • 31 Dec 2019 • Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen

We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis.

Face Generation Face Swapping +1