1 code implementation • 2 May 2024 • Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO.
no code implementations • 16 Apr 2024 • Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text.
1 code implementation • 20 Mar 2024 • Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
In this paper, we present RewardBench, a benchmark dataset and code-base for evaluation, to enhance scientific understanding of reward models.
1 code implementation • 26 Feb 2024 • Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang
A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training.
2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
Language models have become a critical technology to tackling a wide range of natural language processing tasks, yet many details about how the best-performing language models were developed are not reported.
2 code implementations • 17 Nov 2023 • Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.
no code implementations • 31 Oct 2023 • Nathan Lambert, Roberto Calandra
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) more capable in complex settings.
1 code implementation • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf
Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.
Ranked #7 on Zero-Shot Learning on MedConceptsQA
1 code implementation • 10 Oct 2023 • Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment.
no code implementations • 20 Jul 2023 • Alexander N. Alvara, Lydia Lee, Emmanuel Sin, Nathan Lambert, Andrew J. Westphal, Kristofer S. J. Pister
Leveraging advancements in micro-scale technology, we propose a fleet of autonomous, low-cost, small solar sails for interplanetary exploration.
no code implementations • 9 Dec 2022 • Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, Douwe Kiela
We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets.
1 code implementation • 22 Apr 2022 • Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick, Aaron Snoswell
Building systems that are good for society in the face of complex societal effects requires a dynamic approach.
no code implementations • 17 Mar 2022 • Nathan Lambert, Kristofer Pister, Roberto Calandra
In this paper, we explore the effects of subcomponents of a control problem on long term prediction error: including choosing a system, collecting data, and training a model.
1 code implementation • 11 Feb 2022 • Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert
In the long term, reinforcement learning (RL) is considered by many AI theorists to be the most promising path to artificial general intelligence.
no code implementations • 27 Jan 2022 • Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller
Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
1 code implementation • 26 Feb 2021 • Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra
We demonstrate that this problem can be tackled effectively with automated HPO, which we demonstrate to yield significantly improved performance compared to human experts.
Hyperparameter Optimization Model-based Reinforcement Learning +2
no code implementations • 4 Feb 2021 • McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick
Despite interest in communicating ethical problems and social contexts within the undergraduate curriculum to advance Public Interest Technology (PIT) goals, interventions at the graduate level remain largely unexplored.
no code implementations • 2 Sep 2020 • Nathan Lambert, Craig Schindler, Daniel Drew, Kristofer Pister
As a comparison to the significant engineering effort required for an analytic control law, we implement a data-driven model-based reinforcement learning yaw controller in a simulated flight task.
Robotics
2 code implementations • ICLR 2020 • Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra
In our experiments, we study this objective mismatch issue and demonstrate that the likelihood of one-step ahead predictions is not always correlated with control performance.
Model-based Reinforcement Learning reinforcement-learning +1