Search Results for author: Zhanhui Zhou

Found 6 papers, 6 papers with code

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

1 code implementation • 29 May 2024 • Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao

In this work, we introduce $\textit{weak-to-strong search}$, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model.

Instruction Following Language Modelling +1

Paper
Code

MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

1 code implementation • 22 Feb 2024 • Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

By conducting a detailed analysis of real multi-turn dialogue data, we construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.

Paper
Code

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

1 code implementation • 22 Feb 2024 • Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs).

Math Mathematical Reasoning

Paper
Code

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

1 code implementation • 19 Feb 2024 • Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Large language models (LLMs) undergo safety alignment to ensure safe conversations with humans.

Language Modelling

Paper
Code

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

1 code implementation • 14 Feb 2024 • Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

Large Language Models (LLMs) are now commonplace in conversation applications.

Paper
Code

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

1 code implementation • 5 Oct 2023 • Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences.

Language Modelling Long Form Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.