1 code implementation • 23 Aug 2023 • Jian Hu, Li Tao, June Yang, Chandler Zhou
Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values.
reinforcement-learning Reinforcement Learning (RL)