no code implementations • 28 May 2023 • Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani
Image ad understanding is a crucial task with wide real-world applications.
1 code implementation • 3 Apr 2023 • Zhiwei Jia, Fangchen Liu, Vineet Thumuluri, Linghao Chen, Zhiao Huang, Hao Su
We study generalizable policy learning from demonstrations for complex low-level control tasks (e. g., contact-rich object manipulations).
no code implementations • CVPR 2023 • Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani
Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor.
1 code implementation • 26 Jun 2022 • Zhiwei Jia, Xuanlin Li, Zhan Ling, Shuang Liu, Yiran Wu, Hao Su
Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations.
1 code implementation • 24 Jan 2022 • Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme
With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.
no code implementations • 16 Nov 2021 • Yue Tao, Zhiwei Jia, Runze Ma, Shugong Xu
We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module.
1 code implementation • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme
However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.
no code implementations • 13 Aug 2021 • Zhiwei Jia, Shugong Xu, Shiyi Mu, Yue Tao, Shan Cao, Zhiyong Chen
In this paper, we propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition, taking advantage of refined text images input and robust feature representation.
3 code implementations • 30 Jul 2021 • Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao Su
Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator.
no code implementations • 29 Mar 2021 • Jiajun Zhu, Xiufeng Jiang, Zhiwei Jia, Shugong Xu, Shan Cao
Moreover, a paired low-quality scene text video dataset named Text-RBL is proposed, consisting of raw videos, blurry videos, and low-resolution videos, labeled by the proposed convenient semi-automatic labeling strategy.
1 code implementation • ICCV 2021 • Zhiwei Jia, Bodi Yuan, Kangkang Wang, Hong Wu, David Clifford, Zhiqiang Yuan, Hao Su
Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations.
no code implementations • NeurIPS 2020 • Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su
We study how to learn a policy with compositional generalizability.
no code implementations • ECCV 2020 • Shanjiaoyang Huang, Weiqi Peng, Zhiwei Jia, Zhuowen Tu
One-pixel signature is a general representation that can be used to characterize CNN models beyond backdoor detection.
1 code implementation • ICML 2020 • Zhiwei Jia, Hao Su
Recent advances in deep learning theory have evoked the study of generalizability across different local minima of deep neural networks (DNNs).
no code implementations • 6 Dec 2017 • Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control.