no code implementations • 15 May 2024 • Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K.
no code implementations • 14 Mar 2024 • Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li
In particular, we develop and integrate a 3D Spring-Mass model into 3D Gaussian kernels, enabling the reconstruction of the visual appearance, shape, and physical dynamics of the object.
no code implementations • 14 Mar 2024 • Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews, Ivan Villa-Renteria, Jerry Huayang Tang, Claire Tang, Fei Xia, Yunzhu Li, Silvio Savarese, Hyowon Gweon, C. Karen Liu, Jiajun Wu, Li Fei-Fei
We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics.
1 code implementation • 23 Feb 2024 • Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li
Robots need to explore their surroundings to adapt to and tackle tasks in unknown environments.
2 code implementations • 1 Feb 2024 • Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji
LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e. g., the scope of pre-defined tools) and restricted flexibility (e. g., inability to compose multiple tools).
no code implementations • NeurIPS 2023 • Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li
In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms.
no code implementations • 28 Sep 2023 • YiXuan Wang, Zhuoran Li, Mingtong Zhang, Katherine Driggs-Campbell, Jiajun Wu, Li Fei-Fei, Yunzhu Li
These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks.
1 code implementation • 12 Jul 2023 • Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations.
no code implementations • 29 Jun 2023 • YiXuan Wang, Yunzhu Li, Katherine Driggs-Campbell, Li Fei-Fei, Jiajun Wu
Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks.
no code implementations • CVPR 2023 • Stephen Tian, Yancheng Cai, Hong-Xing Yu, Sergey Zakharov, Katherine Liu, Adrien Gaidon, Yunzhu Li, Jiajun Wu
Learned visual dynamics models have proven effective for robotic manipulation tasks.
no code implementations • CVPR 2023 • Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu
We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.
no code implementations • 3 Apr 2023 • Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber
At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.
no code implementations • 27 Oct 2022 • Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held
Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.
1 code implementation • 20 Oct 2022 • Lirui Wang, Kaiqing Zhang, Yunzhu Li, Yonglong Tian, Russ Tedrake
Decentralized learning has been advocated and widely deployed to make efficient use of distributed datasets, with an extensive focus on supervised learning (SL) problems.
no code implementations • 3 Jun 2022 • Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint
This paper demonstrates that learning state representations with supervision from Neural Radiance Fields (NeRFs) can improve the performance of RL compared to other learned representations or even low-dimensional, hand-engineered state information.
no code implementations • 5 May 2022 • Haochen Shi, Huazhe Xu, Zhiao Huang, Yunzhu Li, Jiajun Wu
Our learned model-based planning framework is comparable to and sometimes better than human subjects on the tested tasks.
no code implementations • ICLR 2022 • Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.
no code implementations • ICLR 2022 • Xingyu Lin, Zhiao Huang, Yunzhu Li, Joshua B. Tenenbaum, David Held, Chuang Gan
We consider the problem of sequential robotic manipulation of deformable objects using tools.
no code implementations • 24 Feb 2022 • Danny Driess, Zhiao Huang, Yunzhu Li, Russ Tedrake, Marc Toussaint
We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
no code implementations • 9 Sep 2021 • Qiang Zhang, Yunzhu Li, Yiyue Luo, Wan Shou, Michael Foshey, Junchi Yan, Joshua B. Tenenbaum, Wojciech Matusik, Antonio Torralba
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.
no code implementations • 8 Jul 2021 • Yunzhu Li, Shuang Li, Vincent Sitzmann, Pulkit Agrawal, Antonio Torralba
Humans have a strong intuitive understanding of the 3D environment around us.
no code implementations • CVPR 2021 • Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomas Palacios, Antonio Torralba, Wojciech Matusik
In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input.
1 code implementation • NeurIPS 2020 • Yunzhu Li, Antonio Torralba, Animashree Anandkumar, Dieter Fox, Animesh Garg
We assume access to different configurations and environmental conditions, i. e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions.
1 code implementation • NeurIPS 2020 • Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins
To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
1 code implementation • ICML 2020 • Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba
The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models.
no code implementations • ICLR 2020 • Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba
Finding an embedding space for a linear approximation of a nonlinear dynamical system enables efficient system identification and control synthesis.
3 code implementations • ICLR 2020 • Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum
While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations.
1 code implementation • CVPR 2019 • Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba
To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input.
no code implementations • journal 2019 • Subramanian Sundaram, Petr Kellnhofer, Yunzhu Li, Jun-Yan Zhu, Antonio Torralba & Wojciech Matusik
Using a low-cost (about US$10) scalable tactile glove sensor array, we record a large-scale tactile dataset with 135, 000 frames, each covering the full hand, while interacting with 26 different objects.
no code implementations • ICLR 2019 • Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba
In this paper, we propose to learn a particle-based simulator for complex control tasks.
1 code implementation • 28 Sep 2018 • Yunzhu Li, Jiajun Wu, Jun-Yan Zhu, Joshua B. Tenenbaum, Antonio Torralba, Russ Tedrake
There has been an increasing interest in learning dynamics simulators for model-based control.
4 code implementations • NeurIPS 2017 • Yunzhu Li, Jiaming Song, Stefano Ermon
The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal.
no code implementations • 4 Dec 2016 • Yunzhu Li, Andre Esteva, Brett Kuprel, Rob Novoa, Justin Ko, Sebastian Thrun
Dense object detection and temporal tracking are needed across applications domains ranging from people-tracking to analysis of satellite imagery over time.
5 code implementations • 2 Jun 2016 • Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang
The proposed method addresses two issues in adapting state- of-the-art generic object detection ConvNets (e. g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of prede- fined anchor boxes in the region proposals network (RPN) by exploit- ing a 3D mean face model.
Ranked #7 on Face Detection on Annotated Faces in the Wild