Paper

AI planning in the imagination: High-level planning on learned abstract search spaces

Search and planning algorithms have been a cornerstone of artificial intelligence since the field's inception. Giving reinforcement learning agents the ability to plan during execution time has resulted in significant performance improvements in various domains. However, in real-world environments, the model with respect to which the agent plans has been constrained to be grounded in the real environment itself, as opposed to a more abstract model which allows for planning over compound actions and behaviors. We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training, which is completely decoupled from the real environment. Unlike prior approaches, this enables the agent to perform high-level planning at arbitrary timescales and reason in terms of compound or temporally-extended actions, which can be useful in environments where large numbers of base-level micro-actions are needed to perform relevant macro-actions. In addition, our method is more general than comparable prior methods because it seamlessly handles settings with continuous action spaces, combinatorial action spaces, and partial observability. We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman. Experimentally, it outperforms comparable prior methods without assuming access to an environment simulator at execution time.

Results in Papers With Code
(↓ scroll down to see all results)