Continuous Control

413 papers with code • 73 benchmarks • 9 datasets

Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.

Benchmarks

Add a Result

These leaderboards are used to track progress in Continuous Control

Dataset	Best Model	Compare
PyBullet HalfCheetah	SAC	See all
PyBullet Walker2D	SAC gSDE	See all
PyBullet Ant	SAC gSDE	See all
PyBullet Hopper	SAC gSDE	See all
Lunar Lander (OpenAI Gym)	MAC	See all
DeepMind Cheetah Run (Images)	DreamerV1	See all
DeepMind Cup Catch (Images)	DrQ	See all
DeepMind Walker Walk (Images)	DrQ	See all
cartpole.swingup	SMuZero	See all
cheetah.run	SMuZero	See all
finger.turn_hard	SMuZero	See all
walker.stand	SMuZero	See all
walker.walk	SMuZero	See all
Cart-Pole Balancing	TRPO	See all
Inverted Pendulum	TRPO	See all
Mountain Car	TRPO	See all
Acrobot	TRPO	See all
Double Inverted Pendulum	TRPO	See all
Swimmer	TRPO	See all
Hopper	TRPO	See all
2D Walker	TRPO	See all
Half-Cheetah	TRPO	See all
Ant	TRPO	See all
Simple Humanoid	TRPO	See all
Full Humanoid	TRPO	See all
Cart-Pole Balancing (limited sensors)	TRPO	See all
Inverted Pendulum (limited sensors)	TRPO	See all
Mountain Car (limited sensors)	TRPO	See all
Acrobot (limited sensors)	TRPO	See all
Cart-Pole Balancing (noisy observations)	TRPO	See all
Inverted Pendulum (noisy observations)	TRPO	See all
Mountain Car (noisy observations)	TRPO	See all
Acrobot (noisy observations)	TRPO	See all
Cart-Pole Balancing (system identifications)	TRPO	See all
Inverted Pendulum (system identifications)	TRPO	See all
Mountain Car (system identifications)	TRPO	See all
Acrobot (system identifications)	TRPO	See all
Swimmer + Gathering	TRPO	See all
Ant + Gathering	TRPO	See all
Swimmer + Maze	TRPO	See all
Ant + Maze	TRPO	See all
Cart Pole (OpenAI Gym)	MAC	See all
Finger, spin (DMControl500k)	CURL	See all
Cartpole, swingup (DMControl500k)	CURL	See all
Reacher, easy (DMControl500k)	CURL	See all
Cheetah, run (DMControl500k)	CURL	See all
Walker, walk (DMControl500k)	CURL	See all
Ball in cup, catch (DMControl500k)	CURL	See all
Finger, spin (DMControl100k)	CURL	See all
Cartpole, swingup (DMControl100k)	CURL	See all
Reacher, easy (DMControl100k)	CURL	See all
Cheetah, run (DMControl100k)	CURL	See all
Walker, walk (DMControl100k)	CURL	See all
Ball in cup, catch (DMControl100k)	CURL	See all
acrobot.swingup	SMuZero	See all
cartpole.balance	SMuZero	See all
cartpole.balance_sparse	SMuZero	See all
cartpole.swingup_sparse	SMuZero	See all
ball_in_cup.catch	SMuZero	See all
finger.spin	SMuZero	See all
finger.turn_easy	SMuZero	See all
hopper.hop	SMuZero	See all
hopper.stand	SMuZero	See all
pendulum.swingup	SMuZero	See all
quadruped.run	SMuZero	See all
quadruped.walk	SMuZero	See all
reacher.easy	SMuZero	See all
reacher.hard	SMuZero	See all
walker.run	SMuZero	See all
fish.swim	MuZero Unplugged	See all
manipulator.insert_ball	MuZero Unplugged	See all
manipulator.insert_peg	MuZero Unplugged	See all
humanoid.run	MuZero Unplugged	See all

Show all 73 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Continuous Control models and implementations

DLR-RM/stable-baselines3

8 papers

7,938

hill-a/stable-baselines

7 papers

4,043

opendilab/DI-engine

7 papers

2,551

Kaixhin/imitation-learning

6 papers

386

See all 33 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Proximal Policy Optimization Algorithms

labmlai/annotated_deep_learning_paper_implementations • • 20 Jul 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

171

Paper
Code

Continuous control with deep reinforcement learning

ray-project/ray • 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

157

Paper
Code

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

haarnoja/sac • • ICML 2018

A platform for Applied Reinforcement Learning (Applied RL)

Paper
Code

Addressing Function Approximation Error in Actor-Critic Methods

sfujim/TD3 • • ICML 2018

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies.

Paper
Code

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

DartEnv/dart-env • • 26 Feb 2018

The purpose of this technical report is two-fold.

Paper
Code

Simple random search provides a competitive approach to reinforcement learning

modestyachts/ARS • 19 Mar 2018

A common belief in model-free reinforcement learning is that methods based on random search in the parameter space of policies exhibit significantly worse sample complexity than those that explore the space of actions.

Paper
Code

Dream to Control: Learning Behaviors by Latent Imagination

danijar/dreamer • • ICLR 2020

Learned world models summarize an agent's experience to facilitate learning complex behaviors.

Paper
Code

High-Dimensional Continuous Control Using Generalized Advantage Estimation

labmlai/annotated_deep_learning_paper_implementations • • 8 Jun 2015

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.

Paper
Code

Conservative Q-Learning for Offline Reinforcement Learning

aviralkumar2907/CQL • • NeurIPS 2020

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.

Paper
Code

Benchmarking Deep Reinforcement Learning for Continuous Control

rllab/rllab • • 22 Apr 2016

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning.

Paper
Code

Continuous Control

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result