Domain Knowledge in Exploration Noise in AlphaZero
The AlphaZero algorithm has achieved remarkable success in a variety of sequential, perfect information games including Go, Shogi and Chess. In the original paper the only hyperparameter that is changed from game to game is the $\alpha$ parameter governing a search prior. In this paper we investigate the properties of this hyperparameter. First, we build a formal intuition for its behavior on a toy example meant to isolate the influence of $\alpha$. Then, by comparing performance of AlphaZero agents with different $\alpha$ values on Connect 4, we show that the performance of AlphaZero improves considerably with a good choice of $\alpha$. This all highlights the importance of $\alpha$ as an interpretable hyperparameter which allows for cross-game tuning that more opaque hyperparameters like model architecture may not.
PDF Abstract