Search Results for author: Nathan Lambert

Found 20 papers, 12 papers with code

D2PO: Discriminator-Guided DPO with Response Evaluation Models

1 code implementation • 2 May 2024 • Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett

Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO.

Paper
Code

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

no code implementations • 16 Apr 2024 • Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text.

Ethics

Paper
Add Code

RewardBench: Evaluating Reward Models for Language Modeling

1 code implementation • 20 Mar 2024 • Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi

In this paper, we present RewardBench, a benchmark dataset and code-base for evaluation, to enhance scientific understanding of reward models.

Instruction Following Language Modelling

227

Paper
Code

A Survey on Data Selection for Language Models

1 code implementation • 26 Feb 2024 • Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training.

Unsupervised Pre-training

Paper
Code

OLMo: Accelerating the Science of Language Models

2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.

Language Modelling

4,094

Paper
Code

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo

Language models have become a critical technology to tackling a wide range of natural language processing tasks, yet many details about how the best-performing language models were developed are not reported.

Language Modelling

810

Paper
Code

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

2 code implementations • 17 Nov 2023 • Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.

1,038

Paper
Code

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

no code implementations • 31 Oct 2023 • Nathan Lambert, Roberto Calandra

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) more capable in complex settings.

GSM8K Model-based Reinforcement Learning +1

Paper
Add Code

Zephyr: Direct Distillation of LM Alignment

1 code implementation • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.

Ranked #7 on Zero-Shot Learning on MedConceptsQA

2D Cyclist Detection Few-Shot Learning +2

4,028

Paper
Code

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

1 code implementation • 10 Oct 2023 • Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra

Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment.

Model-based Reinforcement Learning

Paper
Code

BLISS: Interplanetary Exploration with Swarms of Low-Cost Spacecraft

no code implementations • 20 Jul 2023 • Alexander N. Alvara, Lydia Lee, Emmanuel Sin, Nathan Lambert, Andrew J. Westphal, Kristofer S. J. Pister

Leveraging advancements in micro-scale technology, we propose a fleet of autonomous, low-cost, small solar sails for interplanetary exploration.

Paper
Add Code

Measuring Data

no code implementations • 9 Dec 2022 • Margaret Mitchell, Alexandra Sasha Luccioni, Nathan Lambert, Marissa Gerchick, Angelina McMillan-Major, Ezinwanne Ozoani, Nazneen Rajani, Tristan Thrush, Yacine Jernite, Douwe Kiela

We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets.

Paper
Add Code

Reward Reports for Reinforcement Learning

1 code implementation • 22 Apr 2022 • Thomas Krendl Gilbert, Nathan Lambert, Sarah Dean, Tom Zick, Aaron Snoswell

Building systems that are good for society in the face of complex societal effects requires a dynamic approach.

Chatbot reinforcement-learning +1

Paper
Code

Investigating Compounding Prediction Errors in Learned Dynamics Models

no code implementations • 17 Mar 2022 • Nathan Lambert, Kristofer Pister, Roberto Calandra

In this paper, we explore the effects of subcomponents of a control problem on long term prediction error: including choosing a system, collecting data, and training a model.

Model-based Reinforcement Learning

Paper
Add Code

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

1 code implementation • 11 Feb 2022 • Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert

In the long term, reinforcement learning (RL) is considered by many AI theorists to be the most promising path to artificial general intelligence.

Recommendation Systems reinforcement-learning +1

Paper
Code

The Challenges of Exploration for Offline Reinforcement Learning

no code implementations • 27 Jan 2022 • Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.

Model Predictive Control Offline RL +2

Paper
Add Code

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

1 code implementation • 26 Feb 2021 • Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

We demonstrate that this problem can be tackled effectively with automated HPO, which we demonstrate to yield significantly improved performance compared to human experts.

Hyperparameter Optimization Model-based Reinforcement Learning +2

Paper
Code

AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks

no code implementations • 4 Feb 2021 • McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert, Tom Zick

Despite interest in communicating ethical problems and social contexts within the undergraduate curriculum to advance Public Interest Technology (PIT) goals, interventions at the graduate level remain largely unexplored.

Paper
Add Code

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

no code implementations • 2 Sep 2020 • Nathan Lambert, Craig Schindler, Daniel Drew, Kristofer Pister

As a comparison to the significant engineering effort required for an analytic control law, we implement a data-driven model-based reinforcement learning yaw controller in a simulated flight task.

Robotics

Paper
Add Code

Objective Mismatch in Model-based Reinforcement Learning

2 code implementations • ICLR 2020 • Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

In our experiments, we study this objective mismatch issue and demonstrate that the likelihood of one-step ahead predictions is not always correlated with control performance.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.