AlignBench is a comprehensive benchmark designed specifically for evaluating the alignment performance of large Chinese language models (LLMs). It focuses on assessing how well these models align with human intent across multiple dimensions. Let me provide you with more details:
AlignBench aims to address this challenge by providing a comprehensive, multi-dimensional evaluation benchmark specifically for Chinese LLMs.
Data and Construction:
It covers various categories, including fundamental language ability, Chinese understanding, open-ended questions, writing ability, logical reasoning, mathematics, task-oriented role play, and professional knowledge.
Evaluation Methodology:
The evaluation process includes comparing model responses to high-quality reference answers and generating multi-dimensional scores.
CritiqueLLM:
(1) THUDM/AlignBench: 多维度中文对齐评测基准 - GitHub. https://github.com/THUDM/AlignBench. (2) 智谱AI发布中文 LLM 对齐评测基准AlignBench | 前途科技. https://accesspath.com/ai/5890084/. (3) AlignBench:量身打造的中文大语言模型对齐评测 - CSDN博客. https://blog.csdn.net/cenyk1230/article/details/135228409. (4) AlignBench:专为「中文 LLM」而生的对齐评测 - 知乎. https://zhuanlan.zhihu.com/p/671884106. (5) AlignBench: Benchmarking Chinese Alignment of Large Language Models. https://arxiv.org/abs/2311.18743.
Paper | Code | Results | Date | Stars |
---|