Search Results for author: WeiPeng Chen

Found 8 papers, 3 papers with code

Paper
Code

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

no code implementations • 28 May 2024 • Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, WeiPeng Chen, Ji-Rong Wen

Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension.

Paper
Add Code

Base of RoPE Bounds Context Length

no code implementations • 23 May 2024 • Xin Men, Mingyu Xu, Bingning Wang, Qingyu Zhang, Hongyu Lin, Xianpei Han, WeiPeng Chen

We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability.

Position

Paper
Add Code

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

no code implementations • 1 May 2024 • Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, WeiPeng Chen, Bin Cui

Consequently, the GPU spends most of its time on memory transfer instead of computation.

Paper
Add Code

Checkpoint Merging via Bayesian Optimization in LLM Pretraining

no code implementations • 28 Mar 2024 • Deyuan Liu, Zecheng Wang, Bingning Wang, WeiPeng Chen, Chunshan Li, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui

The rapid proliferation of large language models (LLMs) such as GPT-4 and Gemini underscores the intense demand for resources during their training processes, posing significant challenges due to substantial computational and environmental costs.

Bayesian Optimization

Paper
Add Code

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

no code implementations • 6 Mar 2024 • Xin Men, Mingyu Xu, Qingyu Zhang, Bingning Wang, Hongyu Lin, Yaojie Lu, Xianpei Han, WeiPeng Chen

As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters.

Quantization

Paper
Add Code

Baichuan 2: Open Large-scale Language Models

1 code implementation • 19 Sep 2023 • Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, WeiPeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.

Feature Engineering GSM8K

3,979

Paper
Code

ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks

1 code implementation • 16 Jan 2021 • Bingning Wang, Ting Yao, WeiPeng Chen, Jingfang Xu, Xiaochuan Wang

In compositional question answering, the systems should assemble several supporting evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA.

Answer Selection Machine Reading Comprehension +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.