Search Results for author: Xiaoqi Jiao

Found 5 papers, 1 papers with code

Direct Alignment of Language Models via Quality-Aware Self-Refinement

no code implementations • 31 May 2024 • Runsheng Yu, Yong Wang, Xiaoqi Jiao, Youzhi Zhang, James T. Kwok

To alleviate this problem, we investigate the use of intrinsic knowledge within the on-the-fly fine-tuning LLM to obtain relative qualities and help to refine the loss function.

Paper
Add Code

LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation

no code implementations • 11 Mar 2021 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.

Natural Language Understanding XLM-R

Paper
Add Code

Improving Task-Agnostic BERT Distillation with Layer Mapping Search

no code implementations • 11 Dec 2020 • Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.

Knowledge Distillation

Paper
Add Code

TinyBERT: Distilling BERT for Natural Language Understanding

7 code implementations • Findings of the Association for Computational Linguistics 2020 • Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.

Ranked #1 on Natural Language Inference on MultiNLI Dev

Knowledge Distillation Language Modelling +6

11,590

Paper
Code

Convolutional Neural Network for Universal Sentence Embeddings

no code implementations • COLING 2018 • Xiaoqi Jiao, Fang Wang, Dan Feng

This paper proposes a simple CNN model for creating general-purpose sentence embeddings that can transfer easily across domains and can also act as effective initialization for downstream tasks.

Semantic Textual Similarity Sentence +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.