1 code implementation • EMNLP 2021 • Jiao Sun, Xuezhe Ma, Nanyun Peng
We propose to control paraphrase generation through carefully chosen target syntactic structures to generate more proper and higher quality paraphrases.
1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.
1 code implementation • 23 Oct 2023 • Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma
While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks.
1 code implementation • 5 Oct 2023 • Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.
no code implementations • 4 Oct 2023 • Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang
We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.
1 code implementation • 12 Jun 2023 • Shuai Liu, Hyundong J. Cho, Marjorie Freedman, Xuezhe Ma, Jonathan May
Endowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge.
1 code implementation • 23 May 2023 • Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma
Context-aware neural machine translation involves leveraging information beyond sentence-level context to resolve inter-sentential discourse dependencies and improve document-level translation quality, and has given rise to a number of recent techniques.
1 code implementation • 22 May 2023 • Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma
Given a prefix (context), open-ended generation aims to decode texts that are coherent, which do not abruptly drift from previous topics, and informative, which do not suffer from undesired repetitions.
5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy
Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.
1 code implementation • ICCV 2023 • Ming-Chang Chiu, Pin-Yu Chen, Xuezhe Ma
In this paper, we provide 20, 000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets.
no code implementations • 16 Dec 2022 • Ming-Chang Chiu, Yingfei Wang, Derrick Eui Gyu Kim, Pin-Yu Chen, Xuezhe Ma
It is well established in neuroscience that color vision plays an essential part in the human visual perception system.
1 code implementation • 19 Oct 2022 • Chenghao Yang, Xuezhe Ma
Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications.
5 code implementations • 21 Sep 2022 • Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer
The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.
Ranked #1 on Long-range modeling on LRA
no code implementations • 25 May 2022 • Mozhdeh Gheini, Xuezhe Ma, Jonathan May
A recent family of techniques, dubbed lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen.
no code implementations • 25 May 2022 • Jiao Sun, Swabha Swayamdipta, Jonathan May, Xuezhe Ma
After controlling for instances where rationales leak the correct answer while not providing additional background knowledge, we find that incorporating only 5% of rationales during training can boost model performance by 47. 22% for CoS-E and 57. 14% for ECQA during inference.
1 code implementation • 29 Apr 2022 • Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig
One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting.
no code implementations • 18 Feb 2022 • Ming-Chang Chiu, Xuezhe Ma
Despite the high performance achieved by deep neural networks on various tasks, extensive studies have demonstrated that small tweaks in the input could fail the model predictions.
1 code implementation • ICLR 2022 • Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig
Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.
1 code implementation • 14 Jun 2021 • Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig
Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups.
2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer
Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.
1 code implementation • Findings (ACL) 2021 • Shikhar Singh, Nuan Wen, Yu Hou, Pegah Alipoormolabashi, Te-Lin Wu, Xuezhe Ma, Nanyun Peng
To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs.
1 code implementation • NAACL 2021 • Yuwei Wu, Xuezhe Ma, Diyi Yang
Despite the impressive successes of generation and dialogue systems, how to endow a text generation system with particular personality traits to deliver more personalized responses remains under-investigated.
no code implementations • NAACL 2021 • Sarik Ghazarian, Zixi Liu, Tuhin Chakrabarty, Xuezhe Ma, Aram Galstyan, Nanyun Peng
Having engaging and informative conversations with users is the utmost goal for open-domain conversational systems.
no code implementations • 1 Jan 2021 • Xuezhe Ma
In this paper, we introduce Apollo, a quasi-newton method for noncovex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.
3 code implementations • 28 Sep 2020 • Xuezhe Ma
In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.
no code implementations • ACL 2020 • Zhisong Zhang, Xiang Kong, Zhengzhong Liu, Xuezhe Ma, Eduard Hovy
It remains a challenge to detect implicit arguments, calling for more future work of document-level modeling for this task.
1 code implementation • ICLR 2021 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy
In this work, we propose a new generative model that is capable of automatically decoupling global and local representations of images in an entirely unsupervised setting, by embedding a generative flow in the VAE framework to model the decoder.
1 code implementation • CONLL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, Nanyun Peng
We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.
2 code implementations • IJCNLP 2019 • Xuezhe Ma, Chunting Zhou, Xi-An Li, Graham Neubig, Eduard Hovy
Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens.
Ranked #3 on Machine Translation on WMT2016 English-Romanian
1 code implementation • IJCNLP 2019 • Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig
Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.
1 code implementation • ACL 2019 • Zhisong Zhang, Xuezhe Ma, Eduard Hovy
In this paper, we investigate the aspect of structured output modeling for the state-of-the-art graph-based neural dependency parser (Dozat and Manning, 2017).
1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig
Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.
1 code implementation • NAACL 2019 • Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig
Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.
2 code implementations • NeurIPS 2019 • Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy
Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.
Ranked #3 on Image Generation on CelebA 256x256
no code implementations • ICLR 2019 • Xuezhe Ma, Chunting Zhou, Eduard Hovy
Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.
2 code implementations • NAACL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, Nanyun Peng
Different languages might have different word orders.
4 code implementations • ACL 2019 • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing
The versatile toolkit also fosters technique sharing across different text generation tasks.
no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing
The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.
3 code implementations • ACL 2018 • Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, Eduard Hovy
Combining pointer networks~\citep{vinyals2015pointer} with an internal stack, the proposed model first reads and encodes the whole sentence, then builds the dependency tree top-down (from root-to-leaf) in a depth-first fashion.
Ranked #14 on Dependency Parsing on Penn Treebank
no code implementations • IJCNLP 2017 • Jiarui Xu, Xuezhe Ma, Chen-Tse Tsai, Eduard Hovy
This paper aims to provide an effective tool for conversion between Simplified Chinese and Traditional Chinese.
no code implementations • ICLR 2018 • Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy
Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes.
no code implementations • ACL 2017 • Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy
Knowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness.
no code implementations • IJCNLP 2017 • Xuezhe Ma, Eduard Hovy
In this paper, we propose a probabilistic parsing model, which defines a proper conditional probability distribution over non-projective dependency trees for a given sentence, using neural representations as inputs.
no code implementations • 26 Sep 2016 • Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yao-Liang Yu, Yuntian Deng, Eduard Hovy
Algorithmically, we show that our proposed measure of the inference gap can be used to regularize the standard dropout training objective, resulting in an \emph{explicit} control of the gap.
2 code implementations • ACL 2016 • Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric Xing
Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models.
Ranked #65 on Sentiment Analysis on SST-2 Binary classification
no code implementations • NAACL 2016 • Xuezhe Ma, Zhengzhong Liu, Eduard Hovy
Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community.
25 code implementations • ACL 2016 • Xuezhe Ma, Eduard Hovy
State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing.
Ranked #7 on Named Entity Recognition (NER) on CoNLL++
no code implementations • 14 Feb 2015 • Xuezhe Ma, Hai Zhao
This paper presents generalized probabilistic models for high-order projective dependency parsing and an algorithmic framework for learning these statistical models involving dependency trees.