Domain Knowledge in Exploration Noise in AlphaZero

1 Jan 2021  ·  Eric Weiner, George D Montañez, Aaron Trujillo, Abtin Molavi ·

The AlphaZero algorithm has achieved remarkable success in a variety of sequential, perfect information games including Go, Shogi and Chess. In the original paper the only hyperparameter that is changed from game to game is the $\alpha$ parameter governing a search prior. In this paper we investigate the properties of this hyperparameter. First, we build a formal intuition for its behavior on a toy example meant to isolate the influence of $\alpha$. Then, by comparing performance of AlphaZero agents with different $\alpha$ values on Connect 4, we show that the performance of AlphaZero improves considerably with a good choice of $\alpha$. This all highlights the importance of $\alpha$ as an interpretable hyperparameter which allows for cross-game tuning that more opaque hyperparameters like model architecture may not.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods