no code implementations • 12 Apr 2024 • Sarath Sreedharan, Malek Mechergui
Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research.
no code implementations • 2 Feb 2023 • Malek Mechergui, Sarath Sreedharan
To address this lacuna, we propose a novel formulation for the value alignment problem, named goal alignment that focuses on a few central challenges related to value alignment.