Search Results for author: Kecheng Zheng

Found 37 papers, 16 papers with code

CoReS: Orchestrating the Dance of Reasoning and Segmentation

no code implementations • 8 Apr 2024 • Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang

We believe that the act of reasoning segmentation should mirror the cognitive stages of human visual search, where each step is a progressive refinement of thought toward the final object.

Segmentation

Paper
Add Code

DreamLIP: Language-Image Pre-training with Long Captions

1 code implementation • 25 Mar 2024 • Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

Motivated by this, we propose to dynamically sample sub-captions from the text label to construct multiple positive pairs, and introduce a grouping loss to match the embeddings of each sub-caption with its corresponding local image patches in a self-supervised manner.

Contrastive Learning Language Modelling +4

Paper
Code

Contextual AD Narration with Interleaved Multimodal Sequence

no code implementations • 19 Mar 2024 • Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, LiMin Wang

With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the storyline of movie.

Paper
Add Code

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

1 code implementation • 21 Dec 2023 • Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on COCO-Stuff-171

Attribute Open Vocabulary Semantic Segmentation +3

Paper
Code

Learning Naturally Aggregated Appearance for Efficient 3D Editing

1 code implementation • 11 Dec 2023 • Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.

Novel View Synthesis

Paper
Code

GenDeF: Learning Generative Deformation Field for Video Generation

no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen

We offer a new perspective on approaching the task of video generation.

Disentanglement Video Editing +3

Paper
Add Code

Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection

1 code implementation • 4 Dec 2023 • Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao

Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously.

Out-of-Distribution Detection

Paper
Code

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

no code implementations • 19 Nov 2023 • Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.

Image Generation Story Visualization

Paper
Add Code

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

1 code implementation • 7 Sep 2023 • Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis.

Image Generation Philosophy +1

Paper
Code

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

1 code implementation • 15 Aug 2023 • Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.

Image-to-Image Translation Keypoint Detection +1

4,774

Paper
Code

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models

no code implementations • ICCV 2023 • Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, Yujun Shen

To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen.

Paper
Add Code

Cones 2: Customizable Image Synthesis with Multiple Subjects

1 code implementation • 30 May 2023 • Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Synthesizing images with user-specified subjects has received growing attention due to its practical applications.

Image Generation

487

Paper
Code

Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

1 code implementation • CVPR 2023 • Fan Lu, Kai Zhu, Wei Zhai, Kecheng Zheng, Yang Cao

Semantically coherent out-of-distribution (SCOOD) detection aims to discern outliers from the intended data distribution with access to unlabeled extra set.

Out-of-Distribution Detection

Paper
Code

Cones: Concept Neurons in Diffusion Models for Customized Generation

1 code implementation • 9 Mar 2023 • Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image.

6,194

Paper
Code

Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning

no code implementations • ICCV 2023 • Kai Zhu, Kecheng Zheng, Ruili Feng, Deli Zhao, Yang Cao, Zheng-Jun Zha

Non-exemplar class-incremental learning aims to recognize both the old and new classes without access to old class samples.

Class Incremental Learning Incremental Learning

Paper
Add Code

Neural Dependencies Emerging from Learning Massive Categories

no code implementations • CVPR 2023 • Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha

Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.

Image Classification

Paper
Add Code

Rank Diminishing in Deep Neural Networks

no code implementations • 13 Jun 2022 • Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha

By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i. e., ResNets, deep MLPs, and Transformers on ImageNet.

Paper
Add Code

Principled Knowledge Extrapolation with GANs

no code implementations • 21 May 2022 • Ruili Feng, Jie Xiao, Kecheng Zheng, Deli Zhao, Jingren Zhou, Qibin Sun, Zheng-Jun Zha

Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions.

counterfactual

Paper
Add Code

FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization

no code implementations • 24 Mar 2022 • Kecheng Zheng, Yang Cao, Kai Zhu, Ruijing Zhao, Zheng-Jun Zha

However, its generalization performance to heterogeneous tasks is inferior to other architectures (e. g., CNNs and transformers) due to the extensive retention of domain information.

Domain Generalization

Paper
Add Code

Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-Identification

no code implementations • 3 Mar 2022 • Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, Zheng-Jun Zha

RGB-infrared person re-identification is an emerging cross-modality re-identification task, which is very challenging due to significant modality discrepancy between RGB and infrared images.

Person Re-Identification

Paper
Add Code

Debiased Batch Normalization via Gaussian Process for Generalizable Person Re-Identification

no code implementations • 3 Mar 2022 • Jiawei Liu, Zhipeng Huang, Liang Li, Kecheng Zheng, Zheng-Jun Zha

In this paper, we propose a novel Debiased Batch Normalization via Gaussian Process approach (GDNorm) for generalizable person re-identification, which models the feature statistic estimation from BN layers as a dynamically self-refining Gaussian process to alleviate the bias to unseen domain for improving the generalization.

Generalizable Person Re-identification Representation Learning

Paper
Add Code

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

no code implementations • CVPR 2022 • Wei Wu, Jiawei Liu, Kecheng Zheng, Qibin Sun, Zheng-Jun Zha

Image-to-video person re-identification aims to retrieve the same pedestrian as the image-based query from a video-based gallery set.

Image-To-Video Person Re-Identification reinforcement-learning +4

Paper
Add Code

Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification

no code implementations • CVPR 2022 • Zizheng Yang, Xin Jin, Kecheng Zheng, Feng Zhao

During the pre-training, we attempt to address two critical issues for learning fine-grained ReID features: (1) the augmentations in CL pipeline may distort the discriminative clues in person images.

Contrastive Learning Person Re-Identification +2

Paper
Add Code

Unleashing the Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification

1 code implementation • 1 Dec 2021 • Zizheng Yang, Xin Jin, Kecheng Zheng, Feng Zhao

During the pre-training, we attempt to address two critical issues for learning fine-grained ReID features: (1) the augmentations in CL pipeline may distort the discriminative clues in person images.

Contrastive Learning Person Re-Identification +2

Paper
Code

Calibrated Feature Decomposition for Generalizable Person Re-Identification

1 code implementation • 27 Nov 2021 • Kecheng Zheng, Jiawei Liu, Wei Wu, Liang Li, Zheng-Jun Zha

The calibrated person representation is subtly decomposed into the identity-relevant feature, domain feature, and the remaining entangled one.

Domain Generalization Generalizable Person Re-identification

Paper
Code

Semi-Supervised Domain Generalizable Person Re-Identification

3 code implementations • 11 Aug 2021 • Lingxiao He, Wu Liu, Jian Liang, Kecheng Zheng, Xingyu Liao, Peng Cheng, Tao Mei

Instead, we aim to explore multiple labeled datasets to learn generalized domain-invariant representations for person re-id, which is expected universally effective for each new-coming re-id scenario.

Ranked #16 on Person Re-Identification on Market-1501 (using extra training data)

Generalizable Person Re-identification Knowledge Distillation +1

3,292

Paper
Code

Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification

no code implementations • 31 Jul 2021 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha

Occluded person re-identification (ReID) aims to match person images with occlusion.

Knowledge Distillation Person Re-Identification

Paper
Add Code

Adaptive Domain-Specific Normalization for Generalizable Person Re-Identification

no code implementations • 7 May 2021 • Jiawei Liu, Zhipeng Huang, Kecheng Zheng, Dong Liu, Xiaoyan Sun, Zheng-Jun Zha

It describes unseen target domain as a combination of the known source ones, and explicitly learns domain-specific representation with target distribution to improve the model's generalization by a meta-learning pipeline.

Generalizable Person Re-identification Meta-Learning

Paper
Add Code

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

no code implementations • CVPR 2021 • Jiawei Liu, Zheng-Jun Zha, Wei Wu, Kecheng Zheng, Qibin Sun

The key factor for video person re-identification is to effectively exploit both spatial and temporal clues from video sequences.

Ranked #10 on Video Deinterlacing on MSU Deinterlacer Benchmark

Video-Based Person Re-Identification Video Deinterlacing

Paper
Add Code

Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization

1 code implementation • CVPR 2022 • Xin Jin, Tianyu He, Kecheng Zheng, Zhiheng Yin, Xu Shen, Zhen Huang, Ruoyu Feng, Jianqiang Huang, Xian-Sheng Hua, Zhibo Chen

Specifically, we introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations by leveraging personal unique and cloth-independent gait information, we name this framework as GI-ReID.

Ranked #5 on Person Re-Identification on PRCC

Cloth-Changing Person Re-Identification Computational Efficiency +1

Paper
Code

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval

no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo

The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.

Retrieval Text Retrieval +1

Paper
Add Code

Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.

Disentanglement Domain Adaptive Person Re-Identification +2

Paper
Add Code

Group-aware Label Transfer for Domain Adaptive Person Re-identification

1 code implementation • CVPR 2021 • Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha

In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.

Attribute Clustering +5

141

Paper
Code

Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification

1 code implementation • 16 Dec 2020 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha

Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss.

Clustering Domain Adaptive Person Re-Identification +3