no code implementations • 10 May 2024 • Seungwook Han, Idan Shenfeld, Akash Srivastava, Yoon Kim, Pulkit Agrawal
Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem.
no code implementations • 2 May 2024 • Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal
We propose a method for training RL policies for direct force control without requiring access to force sensing.
1 code implementation • 4 Apr 2024 • Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal
While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation.
no code implementations • 6 Mar 2024 • Marcel Torne, Anthony Simeonov, Zechu Li, April Chan, Tao Chen, Abhishek Gupta, Pulkit Agrawal
To learn performant, robust policies without the burden of unsafe real-world data collection or extensive human supervision, we propose RialTo, a system for robustifying real-world imitation learning policies via reinforcement learning in "digital twin" simulation environments constructed on the fly from small amounts of real-world data.
1 code implementation • 29 Feb 2024 • Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal
To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i. e., test cases) that elicit undesirable responses from LLMs.
no code implementations • 26 Feb 2024 • Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal
The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication.
no code implementations • 2 Nov 2023 • Gabriel B. Margolis, Xiang Fu, Yandong Ji, Pulkit Agrawal
We show that the visual system trained with a small amount of real-world traversal data accurately predicts physical parameters.
no code implementations • 31 Oct 2023 • Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta
We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world.
1 code implementation • 26 Oct 2023 • Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete
Deep reinforcement learning methods exhibit impressive performance on a range of tasks but still struggle on hard exploration tasks in large environments with sparse rewards.
1 code implementation • NeurIPS 2023 • Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal
We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset.
no code implementations • 25 Sep 2023 • Meenal Parakh, Alisha Fong, Anthony Simeonov, Tao Chen, Abhishek Gupta, Pulkit Agrawal
Large Language Models (LLMs) have been shown to act like planners that can decompose high-level instructions into a sequence of executable instructions.
no code implementations • 24 Jul 2023 • Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning.
1 code implementation • 20 Jul 2023 • Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta
This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses.
no code implementations • 12 Jul 2023 • Andi Peng, Aviv Netanyahu, Mark Ho, Tianmin Shu, Andreea Bobu, Julie Shah, Pulkit Agrawal
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments.
1 code implementation • 11 Jul 2023 • Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete
Agents build and use a local map to predict their observations; high surprisal leads to a "fragmentation event" that truncates the local map.
no code implementations • 10 Jul 2023 • Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf.
no code implementations • 6 Jul 2023 • Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives.
1 code implementation • 22 Jun 2023 • Zhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain Laroche
This re-weighted sampling strategy may be combined with any offline RL algorithm.
no code implementations • 15 May 2023 • Minyoung Huh, Brian Cheung, Pulkit Agrawal, Phillip Isola
We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments.
1 code implementation • 27 Apr 2023 • Aviv Netanyahu, Abhishek Gupta, Max Simchowitz, Kaiqing Zhang, Pulkit Agrawal
Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data.
no code implementations • 3 Apr 2023 • Yandong Ji, Gabriel B. Margolis, Pulkit Agrawal
DribbleBot (Dexterous Ball Manipulation with a Legged Robot) is a legged robotic system that can dribble a soccer ball under the same real-world conditions as humans (i. e., in-the-wild).
no code implementations • 2 Apr 2023 • Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava
Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned.
no code implementations • 23 Mar 2023 • Sameer Pai, Tao Chen, Megha Tippur, Edward Adelson, Abhishek Gupta, Pulkit Agrawal
We study the problem of object retrieval in scenarios where visual sensing is absent, object shapes are unknown beforehand and objects can move freely, like grabbing objects out of a drawer.
no code implementations • 27 Feb 2023 • Max Simchowitz, Anurag Ajay, Pulkit Agrawal, Akshay Krishnamurthy
We show that, when the class $F$ is "simpler" than $G$ (measured, e. g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$.
no code implementations • 3 Feb 2023 • Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, Anca D. Dragan
To act in the world, robots rely on a representation of salient task aspects: for example, to carry a coffee mug, a robot may consider movement efficiency or mug orientation in its behavior.
no code implementations • 6 Dec 2022 • Gabriel B Margolis, Pulkit Agrawal
Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment.
no code implementations • 28 Nov 2022 • Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenenbaum, Tommi Jaakkola, Pulkit Agrawal
We further demonstrate the advantages of modeling policies as conditional diffusion models by considering two other conditioning variables: constraints and skills.
no code implementations • 24 Nov 2022 • Aviv Netanyahu, Tianmin Shu, Joshua Tenenbaum, Pulkit Agrawal
To address this, we propose a reward learning approach, Graph-based Equivalence Mappings (GEM), that can discover spatial goal representations that are aligned with the intended goal specification, enabling successful generalization in unseen environments.
1 code implementation • 21 Nov 2022 • Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, Pulkit Agrawal
The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation.
1 code implementation • 17 Nov 2022 • Anthony Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, Leslie Pack Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal
This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment.
1 code implementation • 14 Nov 2022 • Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal
However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available.
no code implementations • 6 Oct 2022 • Anurag Ajay, Abhishek Gupta, Dibya Ghosh, Sergey Levine, Pulkit Agrawal
In this work, we develop a framework for meta-RL algorithms that are able to behave appropriately under test-time distribution shifts in the space of tasks.
no code implementations • 18 Aug 2022 • Richard Li, Carlos Esteves, Ameesh Makadia, Pulkit Agrawal
We present a system for accurately predicting stable orientations for diverse rigid objects.
no code implementations • 5 Jul 2022 • Dibya Ghosh, Anurag Ajay, Pulkit Agrawal, Sergey Levine
Offline RL algorithms must account for the fact that the dataset they are provided may leave many facets of the environment unknown.
1 code implementation • 30 Jun 2022 • Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal
One powerful paradigm in visual navigation is to predict actions from observations directly.
no code implementations • ICLR 2022 • Ge Yang, Anurag Ajay, Pulkit Agrawal
Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm.
no code implementations • 5 May 2022 • Gabriel B Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal
Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots.
1 code implementation • 28 Apr 2022 • Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function.
1 code implementation • ICLR 2022 • Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer.
no code implementations • 14 Mar 2022 • Haokuan Luo, Albert Yue, Zhang-Wei Hong, Pulkit Agrawal
We present a strong baseline that surpasses the performance of previously published methods on the Habitat Challenge task of navigating to a target object in indoor environments.
1 code implementation • 9 Dec 2021 • Anthony Simeonov, Yilun Du, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal, Vincent Sitzmann
Our performance generalizes across both object instances and 6-DoF object poses, and significantly outperforms a recent baseline that relies on 2D descriptors.
1 code implementation • 4 Nov 2021 • Tao Chen, Jie Xu, Pulkit Agrawal
The videos of the learned policies are available at: https://taochenshh. github. io/projects/in-hand-reorientation.
2 code implementations • 28 Oct 2021 • Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić
In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge.
no code implementations • 29 Sep 2021 • Anurag Ajay, Ge Yang, Ofir Nachum, Pulkit Agrawal
Deep Reinforcement Learning (RL) agents have achieved superhuman performance on several video game suites.
no code implementations • ICLR 2022 • Ge Yang, Zhang-Wei Hong, Pulkit Agrawal
We simultaneously learn both components.
no code implementations • ICLR 2022 • Rumen Dangovski, Li Jing, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljacic
In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge.
1 code implementation • 15 Jul 2021 • Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, Pulkit Agrawal
Existing methods for co-optimization are limited and fail to explore a rich space of designs.
no code implementations • 8 Jul 2021 • Yunzhu Li, Shuang Li, Vincent Sitzmann, Pulkit Agrawal, Antonio Torralba
Humans have a strong intuitive understanding of the 3D environment around us.
1 code implementation • 29 Jun 2021 • Xiang Fu, Ge Yang, Pulkit Agrawal, Tommi Jaakkola
Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Apr 2021 • Joshua Gruenstein, Tao Chen, Neel Doshi, Pulkit Agrawal
RML provides a general framework for learning from extremely small amounts of interaction data, and our experiments with HAMR clearly demonstrate that RML substantially outperforms existing techniques.
1 code implementation • 18 Mar 2021 • Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola
We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
no code implementations • 1 Jan 2021 • Tao Chen, Pulkit Agrawal
Learning from past mistakes is a quintessential aspect of intelligence.
no code implementations • 16 Nov 2020 • Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois R. Hogan, Joshua Tenenbaum, Pulkit Agrawal, Alberto Rodriguez
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects that operates directly from a point-cloud observation, i. e. without prior object models.
no code implementations • ICLR 2021 • Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited.
1 code implementation • ICML 2020 • Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin
When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality.
1 code implementation • 6 May 2020 • Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick
Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.
1 code implementation • 23 Dec 2019 • Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal
Learning robotic manipulation tasks using reinforcement learning with sparse rewards is currently impractical due to the outrageous data requirements.
no code implementations • 25 Sep 2019 • Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin
When using distributed training to speed up stochastic gradient descent, learning rates must adapt to new scales in order to maintain training effectiveness.
no code implementations • ICLR 2019 • Mayur Mudigonda, Blake Tickell, Pulkit Agrawal
Combining information from different sensory modalities to execute goal directed actions is a key aspect of human intelligence.
1 code implementation • NeurIPS 2019 • Brian Cheung, Alex Terekhov, Yubei Chen, Pulkit Agrawal, Bruno Olshausen
We present a method for storing multiple models within a single set of parameters.
1 code implementation • 21 Jun 2018 • Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra Malik
The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels.
1 code implementation • ICLR 2018 • Deepak Pathak, Parsa Mahmoudieh, Guanghao Luo, Pulkit Agrawal, Dian Chen, Yide Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, Trevor Darrell
In our framework, the role of the expert is only to communicate the goals (i. e., what to imitate) during inference.
1 code implementation • ICML 2018 • Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Thomas L. Griffiths, Alexei A. Efros
What makes humans so good at solving seemingly complex video games?
no code implementations • ICCV 2017 • Panna Felsen, Pulkit Agrawal, Jitendra Malik
A large number of very popular team sports involve the act of one team trying to score a goal against the other.
no code implementations • 22 Jun 2017 • Jeffrey Zhang, Sravani Gajjala, Pulkit Agrawal, Geoffrey H. Tison, Laura A. Hallock, Lauren Beussink-Nelson, Eugene Fan, Mandar A. Aras, ChaRandle Jordan, Kirsten E. Fleischmann, Michelle Melisko, Atif Qasim, Alexei Efros, Sanjiv. J. Shah, Ruzena Bajcsy, Rahul C. Deo
Automated cardiac image interpretation has the potential to transform clinical practice in multiple ways including enabling low-cost serial assessment of cardiac function in the primary care and rural setting.
13 code implementations • ICML 2017 • Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
In many real-world scenarios, rewards extrinsic to the agent are extremely sparse, or absent altogether.
no code implementations • 6 Mar 2017 • Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine
Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.
no code implementations • 6 Nov 2016 • Misha Denil, Pulkit Agrawal, Tejas D. Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas
When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way.
1 code implementation • 30 Aug 2016 • Minyoung Huh, Pulkit Agrawal, Alexei A. Efros
Which is better: more classes or more examples per class?
1 code implementation • NeurIPS 2016 • Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine
We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics.
no code implementations • 23 Nov 2015 • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik
The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.
1 code implementation • CVPR 2016 • Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik
Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.
Ranked #43 on Pose Estimation on MPII Human Pose
no code implementations • ICCV 2015 • Pulkit Agrawal, Joao Carreira, Jitendra Malik
We show that given the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on visual tasks of scene recognition, object recognition, visual odometry and keypoint matching.
no code implementations • 18 Jul 2014 • Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, Jack L. Gallant
We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images.
no code implementations • 7 Jul 2014 • Pulkit Agrawal, Ross Girshick, Jitendra Malik
In the last two years, convolutional neural networks (CNNs) have achieved an impressive suite of results on standard recognition datasets and tasks.