Domain Knowledge in Exploration Noise in AlphaZero

1 Jan 2021 · Eric Weiner, George D Montañez, Aaron Trujillo, Abtin Molavi ·

The AlphaZero algorithm has achieved remarkable success in a variety of sequential, perfect information games including Go, Shogi and Chess. In the original paper the only hyperparameter that is changed from game to game is the $\alpha$ parameter governing a search prior. In this paper we investigate the properties of this hyperparameter. First, we build a formal intuition for its behavior on a toy example meant to isolate the influence of $\alpha$. Then, by comparing performance of AlphaZero agents with different $\alpha$ values on Connect 4, we show that the performance of AlphaZero improves considerably with a good choice of $\alpha$. This all highlights the importance of $\alpha$ as an interpretable hyperparameter which allows for cross-game tuning that more opaque hyperparameters like model architecture may not.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

AlphaZero

Edit Social Preview

Domain Knowledge in Exploration Noise in AlphaZero

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove