Search Results for author: Xuezhe Ma

Found 53 papers, 31 papers with code

AESOP: Paraphrase Generation with Adaptive Syntactic Control

1 code implementation • EMNLP 2021 • Jiao Sun, Xuezhe Ma, Nanyun Peng

We propose to control paraphrase generation through carefully chosen target syntactic structures to generate more proper and higher quality paraphrases.

Data Augmentation Language Modelling +2

Paper
Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

371

Paper
Code

Evaluating Large Language Models on Controlled Generation Tasks

1 code implementation • 23 Oct 2023 • Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks.

Question Generation Question-Generation +2

Paper
Code

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation • 5 Oct 2023 • Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

155

Paper
Code

MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

no code implementations • 4 Oct 2023 • Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.

Paper
Add Code

RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation

1 code implementation • 12 Jun 2023 • Shuai Liu, Hyundong J. Cho, Marjorie Freedman, Xuezhe Ma, Jonathan May

Endowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge.

Decoder Response Generation +1

Paper
Code

Challenges in Context-Aware Neural Machine Translation

1 code implementation • 23 May 2023 • Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma

Context-aware neural machine translation involves leveraging information beyond sentence-level context to resolve inter-sentential discourse dependencies and improve document-level translation quality, and has given rise to a number of recent techniques.

Machine Translation Sentence +1

Paper
Code

Look-back Decoding for Open-Ended Text Generation

1 code implementation • 22 May 2023 • Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma

Given a prefix (context), open-ended generation aims to decode texts that are coherent, which do not abruptly drift from previous topics, and informative, which do not suffer from undesired repetitions.

Story Generation

Paper
Code

LIMA: Less Is More for Alignment

5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

2,537

Paper
Code

Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification

1 code implementation • ICCV 2023 • Ming-Chang Chiu, Pin-Yu Chen, Xuezhe Ma

In this paper, we provide 20, 000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets.

Data Augmentation Image Classification

Paper
Code

On Human Visual Contrast Sensitivity and Machine Vision Robustness: A Comparative Study

no code implementations • 16 Dec 2022 • Ming-Chang Chiu, Yingfei Wang, Derrick Eui Gyu Kim, Pin-Yu Chen, Xuezhe Ma

It is well established in neuroscience that color vision plays an essential part in the human visual perception system.

Data Augmentation

Paper
Add Code

Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

1 code implementation • 19 Oct 2022 • Chenghao Yang, Xuezhe Ma

Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications.

Paper
Code

Mega: Moving Average Equipped Gated Attention

5 code implementations • 21 Sep 2022 • Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

Ranked #1 on Long-range modeling on LRA

Image Classification Inductive Bias +3

125,862

Paper
Code

Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning

no code implementations • 25 May 2022 • Mozhdeh Gheini, Xuezhe Ma, Jonathan May

A recent family of techniques, dubbed lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen.

Cross-Lingual NER Language Modelling +3

Paper
Add Code

Investigating the Benefits of Free-Form Rationales

no code implementations • 25 May 2022 • Jiao Sun, Swabha Swayamdipta, Jonathan May, Xuezhe Ma

After controlling for instances where rationales leak the correct answer while not providing additional background knowledge, we find that incorporating only 5% of rationales during training can boost model performance by 47. 22% for CoS-E and 57. 14% for ECQA during inference.

Paper
Add Code

Prompt Consistency for Zero-Shot Task Generalization

1 code implementation • 29 Apr 2022 • Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting.

Paper
Code

Learning Representations Robust to Group Shifts and Adversarial Examples

no code implementations • 18 Feb 2022 • Ming-Chang Chiu, Xuezhe Ma

Despite the high performance achieved by deep neural networks on various tasks, extensive studies have demonstrated that small tweaks in the input could fail the model predictions.

Representation Learning

Paper
Add Code

Towards a Unified View of Parameter-Efficient Transfer Learning

1 code implementation • ICLR 2022 • Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

Machine Translation text-classification +3

489

Paper
Code

Examining and Combating Spurious Features under Distribution Shift

1 code implementation • 14 Jun 2021 • Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig

Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups.

Paper
Code

Luna: Linear Unified Nested Attention

2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer

Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.

Language Modelling Machine Translation +2

104

Paper
Code

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

1 code implementation • Findings (ACL) 2021 • Shikhar Singh, Nuan Wen, Yu Hou, Pegah Alipoormolabashi, Te-Lin Wu, Xuezhe Ma, Nanyun Peng

To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs.

4k Sentence

Paper
Code

Personalized Response Generation via Generative Split Memory Network

1 code implementation • NAACL 2021 • Yuwei Wu, Xuezhe Ma, Diyi Yang

Despite the impressive successes of generation and dialogue systems, how to endow a text generation system with particular personality traits to deliver more personalized responses remains under-investigated.

Response Generation Text Generation

Paper
Code

DiSCoL: Toward Engaging Dialogue Systems through Conversational Line Guided Response Generation

no code implementations • NAACL 2021 • Sarik Ghazarian, Zixi Liu, Tuhin Chakrabarty, Xuezhe Ma, Aram Galstyan, Nanyun Peng

Having engaging and informative conversations with users is the utmost goal for open-domain conversational systems.

Response Generation

Paper
Add Code

Apollo: An Adaptive Parameter-wised Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

no code implementations • 1 Jan 2021 • Xuezhe Ma

In this paper, we introduce Apollo, a quasi-newton method for noncovex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

Stochastic Optimization

Paper
Add Code

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

3 code implementations • 28 Sep 2020 • Xuezhe Ma

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

Stochastic Optimization

175

Paper
Code

A Two-Step Approach for Implicit Event Argument Detection

no code implementations • ACL 2020 • Zhisong Zhang, Xiang Kong, Zhengzhong Liu, Xuezhe Ma, Eduard Hovy

It remains a challenge to detect implicit arguments, calling for more future work of document-level modeling for this task.

Sentence Vocal Bursts Valence Prediction

Paper
Add Code

Decoupling Global and Local Representations via Invertible Generative Flows

1 code implementation • ICLR 2021 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

In this work, we propose a new generative model that is capable of automatically decoupling global and local representations of images in an entirely unsupervised setting, by embedding a generative flow in the VAE framework to model the decoder.

Decoder Density Estimation +3

Paper
Code

Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

1 code implementation • CONLL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, Nanyun Peng

We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.

Cross-Lingual Transfer Dependency Parsing +2

Paper
Code

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

2 code implementations • IJCNLP 2019 • Xuezhe Ma, Chunting Zhou, Xi-An Li, Graham Neubig, Eduard Hovy

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens.

Ranked #3 on Machine Translation on WMT2016 English-Romanian

Machine Translation NMT +1

244

Paper
Code

Handling Syntactic Divergence in Low-resource Machine Translation

1 code implementation • IJCNLP 2019 • Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig

Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.

Data Augmentation Machine Translation +2

Paper
Code

An Empirical Investigation of Structured Output Modeling for Graph-based Neural Dependency Parsing

1 code implementation • ACL 2019 • Zhisong Zhang, Xuezhe Ma, Eduard Hovy

In this paper, we investigate the aspect of structured output modeling for the state-of-the-art graph-based neural dependency parser (Dozat and Manning, 2017).

Dependency Parsing Sentence

Paper
Code

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

Paper
Code

Density Matching for Bilingual Word Embedding

1 code implementation • NAACL 2019 • Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.

Bilingual Lexicon Induction Word Embeddings +1

Paper
Code

MaCow: Masked Convolutional Generative Flow

2 code implementations • NeurIPS 2019 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Ranked #3 on Image Generation on CelebA 256x256

Computational Efficiency Density Estimation +1

Paper
Code

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

no code implementations • ICLR 2019 • Xuezhe Ma, Chunting Zhou, Eduard Hovy

Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Density Estimation Image Generation +1

Paper
Add Code

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

2 code implementations • NAACL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, Nanyun Peng

Different languages might have different word orders.

Cross-Lingual Transfer Dependency Parsing

Paper
Code

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

4 code implementations • ACL 2019 • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing

The versatile toolkit also fosters technique sharing across different text generation tasks.

Machine Translation Text Generation +1

2,382

Paper
Code

Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation

no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing

The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.

Image Captioning Machine Translation +3

Paper
Add Code

Stack-Pointer Networks for Dependency Parsing

3 code implementations • ACL 2018 • Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, Eduard Hovy

Combining pointer networks~\citep{vinyals2015pointer} with an internal stack, the proposed model first reads and encodes the whole sentence, then builds the dependency tree top-down (from root-to-leaf) in a depth-first fashion.

Ranked #14 on Dependency Parsing on Penn Treebank

Dependency Parsing Sentence

440

Paper
Code

STCP: Simplified-Traditional Chinese Conversion and Proofreading

no code implementations • IJCNLP 2017 • Jiarui Xu, Xuezhe Ma, Chen-Tse Tsai, Eduard Hovy

This paper aims to provide an effective tool for conversion between Simplified Chinese and Traditional Chinese.

Paper
Add Code

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

no code implementations • ICLR 2018 • Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy

Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes.

Dependency Parsing Image Captioning +6

Paper
Add Code

An Interpretable Knowledge Transfer Model for Knowledge Base Completion

no code implementations • ACL 2017 • Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy

Knowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness.

Knowledge Base Completion Transfer Learning

Paper
Add Code

Neural Probabilistic Model for Non-projective MST Parsing

no code implementations • IJCNLP 2017 • Xuezhe Ma, Eduard Hovy

In this paper, we propose a probabilistic parsing model, which defines a proper conditional probability distribution over non-projective dependency trees for a given sentence, using neural representations as inputs.

Sentence

Paper
Add Code

Dropout with Expectation-linear Regularization

no code implementations • 26 Sep 2016 • Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yao-Liang Yu, Yuntian Deng, Eduard Hovy

Algorithmically, we show that our proposed measure of the inference gap can be used to regularize the standard dropout training objective, resulting in an \emph{explicit} control of the gap.

Image Classification

Paper
Add Code

Harnessing Deep Neural Networks with Logic Rules

2 code implementations • ACL 2016 • Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric Xing

Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models.

Ranked #65 on Sentiment Analysis on SST-2 Binary classification

named-entity-recognition Named Entity Recognition +2

133

Paper
Code

Unsupervised Ranking Model for Entity Coreference Resolution

no code implementations • NAACL 2016 • Xuezhe Ma, Zhengzhong Liu, Eduard Hovy

Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community.

coreference-resolution

Paper
Add Code

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

25 code implementations • ACL 2016 • Xuezhe Ma, Eduard Hovy

State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing.

Ranked #7 on Named Entity Recognition (NER) on CoNLL++

Feature Engineering Named Entity Recognition +3

1,941

Paper
Code

Efficient Inner-to-outer Greedy Algorithm for Higher-order Labeled Dependency Parsing

no code implementations • EMNLP 2015 • Xuezhe Ma, Eduard Hovy

Coreference Resolution Dependency Parsing +2

Paper
Add Code

Word Sense Disambiguation via PropStore and OntoNotes for Event Mention Detection

no code implementations • WS 2015 • Nicolas R. Fauceglia, Yiu-Chang Lin, Xuezhe Ma, Eduard Hovy

Information Retrieval Machine Translation +1

Paper
Add Code

Probabilistic Models for High-Order Projective Dependency Parsing

no code implementations • 14 Feb 2015 • Xuezhe Ma, Hai Zhao

This paper presents generalized probabilistic models for high-order projective dependency parsing and an algorithmic framework for learning these statistical models involving dependency trees.

Dependency Parsing Vocal Bursts Intensity Prediction