Explainable Reinforcement Learning for Longitudinal Control

6 Feb 2021 · Roman Liessner, Jan Dohmen, Marco Wiering ·

Deep Reinforcement Learning (DRL) has the potential to surpass the existing state of the art in various practical applications. However, as long as learned strategies and performed decisions are difficult to interpret, DRL will not find its way into safety-relevant fields of application. SHAP values are an approach to overcome this problem. It is expected that the addition of these values to DRL provides an improved understanding of the learned action-selection policy. In this paper, the application of a SHAP method for DRL is demonstrated by means of the OpenAI Gym LongiControl Environment. In this problem, the agent drives an autonomous vehicle under consideration of speed limits in a single lane route. The controls learned with a DDPG algorithm are interpreted by a novel approach combining learned actions and SHAP values. The proposed RL-SHAP representation makes it possible to observe in every time step which features have a positive or negative effect on the selected action and which influences are negligible. The results show that RL-SHAP values are a suitable approach to interpret the decisions of the agent.

PDF Abstract