no code implementations • EMNLP (WNUT) 2020 • Yulia Grishina, Thomas Gueudre, Ralf Winkler
True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display.
1 code implementation • NAACL (ACL) 2022 • Thomas Gueudre, Kevin Jose
Unsupervised word alignments offer a lightweight and interpretable method to transfer labels from high- to low-resource languages, as long as semantically related words have the same label across languages.
no code implementations • 10 Oct 2022 • Charith Peris, Lizhen Tan, Thomas Gueudre, Turan Gojayev, Pan Wei, Gokmen Oz
Yet, the generic corpora used to pretrain the teacher and the corpora associated with the downstream target domain are often significantly different, which raises a natural question: should the student be distilled over the generic corpora, so as to learn from high-quality teacher predictions, or over the downstream task corpora to align with finetuning?
no code implementations • 15 Jun 2022 • Jack FitzGerald, Shankar Ananthakrishnan, Konstantine Arkoudas, Davide Bernardi, Abhishek Bhagia, Claudio Delli Bovi, Jin Cao, Rakesh Chada, Amit Chauhan, Luoxin Chen, Anurag Dwarakanath, Satyam Dwivedi, Turan Gojayev, Karthik Gopalakrishnan, Thomas Gueudre, Dilek Hakkani-Tur, Wael Hamza, Jonathan Hueser, Kevin Martin Jose, Haidar Khan, Beiye Liu, Jianhua Lu, Alessandro Manzotti, Pradeep Natarajan, Karolina Owczarzak, Gokmen Oz, Enrico Palumbo, Charith Peris, Chandana Satya Prakash, Stephen Rawls, Andy Rosenbaum, Anjali Shenoy, Saleh Soltan, Mukund Harakere Sridhar, Liz Tan, Fabian Triefenbach, Pan Wei, Haiyang Yu, Shuai Zheng, Gokhan Tur, Prem Natarajan
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9. 3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system.
Cross-Lingual Natural Language Inference intent-classification +5