no code implementations • 11 Mar 2024 • Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, Eric Xing
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis.
1 code implementation • 5 Dec 2023 • Jiachen Lu, Ze Huang, Zeyu Yang, Jiahui Zhang, Li Zhang
Generating multi-camera street-view videos is critical for augmenting autonomous driving datasets, addressing the urgent demand for extensive and varied data.
no code implementations • 16 Oct 2023 • Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim
Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set.
no code implementations • ICCV 2023 • Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Kunhao Liu, Rongliang Wu, Xiaoqin Zhang, Ling Shao, Shijian Lu
However, as the pose estimator is trained with only rendered images, the pose estimation is usually biased or inaccurate for real images due to the domain gap between real images and rendered images, leading to poor robustness for the pose estimation of real images and further local minima in joint optimization.
no code implementations • ICCV 2023 • Muyu Xu, Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Xiaoqin Zhang, Christian Theobalt, Ling Shao, Shijian Lu
Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation.
no code implementations • 20 Jun 2023 • Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim
Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks.
1 code implementation • NeurIPS 2023 • Kunhao Liu, Fangneng Zhan, Jiahui Zhang, Muyu Xu, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, Shijian Lu
Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research.
no code implementations • 18 Apr 2023 • Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Shengcai Liao, Shijian Lu
POCE achieves the more accessible and realistic pose-controllable expression editing by mapping face images into UV space, where facial expressions and head poses can be disentangled and edited separately.
no code implementations • 18 Apr 2023 • Rongliang Wu, Yingchen Yu, Fangneng Zhan, Jiahui Zhang, Xiaoqin Zhang, Shijian Lu
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network that can model the variational facial animation distribution conditioned upon the input audio and autoregressively convert the audio signals into a facial animation sequence.
1 code implementation • CVPR 2023 • Kunhao Liu, Fangneng Zhan, YiWen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing
In addition, it transforms the grid features according to the reference style which directly leads to high-quality zero-shot style transfer.
no code implementations • CVPR 2023 • Jiahui Zhang, Fangneng Zhan, Christian Theobalt, Shijian Lu
The first is a prior distribution regularization which measures the discrepancy between a prior token distribution and the predicted token distribution to avoid codebook collapse and low codebook utilization.
no code implementations • 4 Aug 2022 • Jiahui Zhang, Fangneng Zhan, Yingchen Yu, Rongliang Wu, Xiaoqin Zhang, Shijian Lu
In addition, stochastic noises fed to the generator are employed for unconditional detail generation, which tends to produce unfaithful details that compromise the fidelity of the generated SR image.
no code implementations • 26 Jul 2022 • Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan
Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.
no code implementations • 21 Jul 2022 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Changgong Zhang, Shijian Lu
Extensive experiments over multiple conditional image generation tasks show that our method achieves superior diverse image generation performance qualitatively and quantitatively as compared with the state-of-the-art.
no code implementations • 6 Jul 2022 • Jiahui Zhang, Fangneng Zhan, Rongliang Wu, Yingchen Yu, Wenqing Zhang, Bai Song, Xiaoqin Zhang, Shijian Lu
With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose transformations between the pair of rendered and real images.
1 code implementation • 6 Jul 2022 • Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jiahui Zhang, Shijian Lu, Miaomiao Cui, Xuansong Xie, Xian-Sheng Hua, Chunyan Miao
In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing.
no code implementations • CVPR 2022 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Changgong Zhang
We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
1 code implementation • CVPR 2022 • Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, Shijian Lu
Perceiving the similarity between images has been a long-standing and fundamental problem underlying various visual generation tasks.
1 code implementation • ICLR 2022 • Shitao Tang, Jiahui Zhang, Siyu Zhu, Ping Tan
Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency.
no code implementations • 31 Dec 2021 • Dawei Wang, Lingping Gao, Ziquan Lan, Wei Li, Jiaping Ren, Jiahui Zhang, Peng Zhang, Pei Zhou, Shengao Wang, Jia Pan, Dinesh Manocha, Ruigang Yang
Recently, there have been many advances in autonomous driving society, attracting a lot of attention from academia and industry.
2 code implementations • 27 Dec 2021 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing
With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years.
1 code implementation • Findings (ACL) 2021 • Qian Liu, Dejian Yang, Jiahui Zhang, Jiaqi Guo, Bin Zhou, Jian-Guang Lou
Recent years pretrained language models (PLMs) hit a success on several downstream tasks, showing their power on modeling language.
1 code implementation • ICCV 2021 • Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, Long Quan
2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs.
2 code implementations • 7 Jul 2021 • Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, Chunyan Miao
This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence.
no code implementations • 1 Jul 2021 • Jiahui Zhang, Shijian Lu, Fangneng Zhan, Yingchen Yu
Extensive experiments on synthetic datasets and real images show that the proposed CRL-SR can handle multi-modal and spatially variant degradation effectively under blind settings and it also outperforms state-of-the-art SR methods qualitatively and quantitatively.
1 code implementation • CVPR 2020 • Lei Zhou, Zixin Luo, Tianwei Shen, Jiahui Zhang, Mingmin Zhen, Yao Yao, Tian Fang, Long Quan
Temporal camera relocalization estimates the pose with respect to each video frame in sequence, as opposed to one-shot relocalization which focuses on a still image.
4 code implementations • CVPR 2020 • Zixin Luo, Lei Zhou, Xuyang Bai, Hongkai Chen, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan
This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors.
1 code implementation • 19 Sep 2019 • Tianwei Shen, Lei Zhou, Zixin Luo, Yao Yao, Shiwei Li, Jiahui Zhang, Tian Fang, Long Quan
The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data.
1 code implementation • ICCV 2019 • Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao
First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix.
1 code implementation • ECCV 2018 • Jiahui Zhang, Hao Zhao, Anbang Yao, Yurong Chen, Li Zhang, Hongen Liao
We introduce Spatial Group Convolution (SGC) for accelerating the computation of 3D dense prediction tasks.
Ranked #9 on 3D Semantic Scene Completion on SemanticKITTI
1 code implementation • CVPR 2019 • Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations.
no code implementations • 5 Sep 2018 • Ke Wang, Han Song, Jiahui Zhang, Xinran Zhang, Hongen Liao
In this paper, we proposed a method which can fuse different modalities 3D data to get a large-scale and dense point cloud.