Search Results for author: Zhichen Dong

Found 3 papers, 3 papers with code

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

1 code implementation • 29 May 2024 • Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao

In this work, we introduce $\textit{weak-to-strong search}$, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model.

Instruction Following Language Modelling +1

Paper
Code

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

1 code implementation • 19 Feb 2024 • Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Large language models (LLMs) undergo safety alignment to ensure safe conversations with humans.

Language Modelling

Paper
Code

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

1 code implementation • 14 Feb 2024 • Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

Large Language Models (LLMs) are now commonplace in conversation applications.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.