no code implementations • 8 Apr 2024 • Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang
We believe that the act of reasoning segmentation should mirror the cognitive stages of human visual search, where each step is a progressive refinement of thought toward the final object.
1 code implementation • 25 Mar 2024 • Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
Motivated by this, we propose to dynamically sample sub-captions from the text label to construct multiple positive pairs, and introduce a grouping loss to match the embeddings of each sub-caption with its corresponding local image patches in a self-supervised manner.
no code implementations • 19 Mar 2024 • Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, LiMin Wang
With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the storyline of movie.
1 code implementation • 21 Dec 2023 • Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen
The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data.
1 code implementation • 11 Dec 2023 • Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen
Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.
no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen
We offer a new perspective on approaching the task of video generation.
1 code implementation • 4 Dec 2023 • Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao
Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously.
no code implementations • 19 Nov 2023 • Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen
We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.
1 code implementation • 7 Sep 2023 • Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen
Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis.
1 code implementation • 15 Aug 2023 • Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen
We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.
no code implementations • ICCV 2023 • Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, Yujun Shen
To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen.
1 code implementation • 30 May 2023 • Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao
Synthesizing images with user-specified subjects has received growing attention due to its practical applications.
1 code implementation • CVPR 2023 • Fan Lu, Kai Zhu, Wei Zhai, Kecheng Zheng, Yang Cao
Semantically coherent out-of-distribution (SCOOD) detection aims to discern outliers from the intended data distribution with access to unlabeled extra set.
1 code implementation • 9 Mar 2023 • Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao
Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image.
no code implementations • ICCV 2023 • Kai Zhu, Kecheng Zheng, Ruili Feng, Deli Zhao, Yang Cao, Zheng-Jun Zha
Non-exemplar class-incremental learning aims to recognize both the old and new classes without access to old class samples.
no code implementations • CVPR 2023 • Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha
Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.
no code implementations • 13 Jun 2022 • Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha
By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i. e., ResNets, deep MLPs, and Transformers on ImageNet.
no code implementations • 21 May 2022 • Ruili Feng, Jie Xiao, Kecheng Zheng, Deli Zhao, Jingren Zhou, Qibin Sun, Zheng-Jun Zha
Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions.
no code implementations • 24 Mar 2022 • Kecheng Zheng, Yang Cao, Kai Zhu, Ruijing Zhao, Zheng-Jun Zha
However, its generalization performance to heterogeneous tasks is inferior to other architectures (e. g., CNNs and transformers) due to the extensive retention of domain information.
no code implementations • 3 Mar 2022 • Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, Zheng-Jun Zha
RGB-infrared person re-identification is an emerging cross-modality re-identification task, which is very challenging due to significant modality discrepancy between RGB and infrared images.
no code implementations • 3 Mar 2022 • Jiawei Liu, Zhipeng Huang, Liang Li, Kecheng Zheng, Zheng-Jun Zha
In this paper, we propose a novel Debiased Batch Normalization via Gaussian Process approach (GDNorm) for generalizable person re-identification, which models the feature statistic estimation from BN layers as a dynamically self-refining Gaussian process to alleviate the bias to unseen domain for improving the generalization.
Generalizable Person Re-identification Representation Learning
no code implementations • CVPR 2022 • Wei Wu, Jiawei Liu, Kecheng Zheng, Qibin Sun, Zheng-Jun Zha
Image-to-video person re-identification aims to retrieve the same pedestrian as the image-based query from a video-based gallery set.
Image-To-Video Person Re-Identification reinforcement-learning +4
no code implementations • CVPR 2022 • Zizheng Yang, Xin Jin, Kecheng Zheng, Feng Zhao
During the pre-training, we attempt to address two critical issues for learning fine-grained ReID features: (1) the augmentations in CL pipeline may distort the discriminative clues in person images.
1 code implementation • 1 Dec 2021 • Zizheng Yang, Xin Jin, Kecheng Zheng, Feng Zhao
During the pre-training, we attempt to address two critical issues for learning fine-grained ReID features: (1) the augmentations in CL pipeline may distort the discriminative clues in person images.
1 code implementation • 27 Nov 2021 • Kecheng Zheng, Jiawei Liu, Wei Wu, Liang Li, Zheng-Jun Zha
The calibrated person representation is subtly decomposed into the identity-relevant feature, domain feature, and the remaining entangled one.
Domain Generalization Generalizable Person Re-identification
3 code implementations • 11 Aug 2021 • Lingxiao He, Wu Liu, Jian Liang, Kecheng Zheng, Xingyu Liao, Peng Cheng, Tao Mei
Instead, we aim to explore multiple labeled datasets to learn generalized domain-invariant representations for person re-id, which is expected universally effective for each new-coming re-id scenario.
Ranked #16 on Person Re-Identification on Market-1501 (using extra training data)
Generalizable Person Re-identification Knowledge Distillation +1
no code implementations • 31 Jul 2021 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha
Occluded person re-identification (ReID) aims to match person images with occlusion.
no code implementations • 7 May 2021 • Jiawei Liu, Zhipeng Huang, Kecheng Zheng, Dong Liu, Xiaoyan Sun, Zheng-Jun Zha
It describes unseen target domain as a combination of the known source ones, and explicitly learns domain-specific representation with target distribution to improve the model's generalization by a meta-learning pipeline.
no code implementations • CVPR 2021 • Jiawei Liu, Zheng-Jun Zha, Wei Wu, Kecheng Zheng, Qibin Sun
The key factor for video person re-identification is to effectively exploit both spatial and temporal clues from video sequences.
Ranked #10 on Video Deinterlacing on MSU Deinterlacer Benchmark
1 code implementation • CVPR 2022 • Xin Jin, Tianyu He, Kecheng Zheng, Zhiheng Yin, Xu Shen, Zhen Huang, Ruoyu Feng, Jianqiang Huang, Xian-Sheng Hua, Zhibo Chen
Specifically, we introduce Gait recognition as an auxiliary task to drive the Image ReID model to learn cloth-agnostic representations by leveraging personal unique and cloth-independent gait information, we name this framework as GI-ReID.
Ranked #5 on Person Re-Identification on PRCC
Cloth-Changing Person Re-Identification Computational Efficiency +1
no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo
The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.
no code implementations • 25 Mar 2021 • Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen
Each recomposed feature, obtained based on the domain-invariant feature (which enables a reliable inheritance of identity) and an enhancement from a domain specific feature (which enables the approximation of real distributions), is thus an "ideal" augmentation.
1 code implementation • CVPR 2021 • Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha
In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
1 code implementation • 16 Dec 2020 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha
Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss.
no code implementations • 10 Oct 2020 • Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, Tao Mei
This hard selection strategy is able to fuse the strong-relevant multi-modality features for alleviating the problem of matching redundancy.
Ranked #16 on Text based Person Retrieval on CUHK-PEDES
no code implementations • 10 Apr 2020 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha
Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.
1 code implementation • NeurIPS 2019 • Kecheng Zheng, Zheng-Jun Zha, Wei Wei
Abstraction reasoning is a long-standing challenge in artificial intelligence.