no code implementations • LREC 2022 • Winston Wu, David Yarowsky
We evaluate two popular neural cognate generation models’ robustness to several types of human-plausible noise (deletion, duplication, swapping, and keyboard errors, as well as a new type of error, phonological errors).
no code implementations • loresmt (COLING) 2022 • Winston Wu, David Yarowsky
Translating into low-resource languages is challenging due to the scarcity of training data.
no code implementations • RANLP (BUCC) 2021 • Winston Wu, David Yarowsky
We constructed parsers for five non-English editions of Wiktionary, which combined with pronunciations from the English edition, comprises over 5. 3 million IPA pronunciations, the largest pronunciation lexicon of its kind.
no code implementations • COLING 2022 • Georgie Botev, Arya D. McCarthy, Winston Wu, David Yarowsky
This paper presents a detailed foundational empirical case study of the nature of out-of-vocabulary words encountered in modern text in a moderate-resource language such as Bulgarian, and a multi-faceted distributional analysis of the underlying word-formation processes that can aid in their compositional translation, tagging, parsing, language modeling, and other NLP tasks.
1 code implementation • 16 Nov 2023 • Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang
News media often strive to minimize explicit moral language in news articles, yet most articles are dense with moral values as expressed through the reported events themselves.
1 code implementation • 28 Oct 2023 • Kaijian Zou, Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang
We benchmark PAC to highlight the challenges of this task.
1 code implementation • 24 May 2023 • Naihao Deng, Xinliang Frederick Zhang, Siyang Liu, Winston Wu, Lu Wang, Rada Mihalcea
Annotator disagreement is ubiquitous in natural language processing (NLP) tasks.
no code implementations • 23 May 2023 • Naihao Deng, YiKai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, Rada Mihalcea
Our results show that our system can meet the diverse needs of NLP researchers and significantly accelerate the annotation process.
no code implementations • 21 May 2023 • Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea
Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on.
no code implementations • 7 Nov 2022 • Reza Bradrania, Robert Elliott, Winston Wu
We find that commonality in liquidity is higher for large stocks compared to small stocks in the cross-section of stocks, and the spread between the two has increased over the past two decades.
no code implementations • 4 Nov 2022 • Changyuan Qiu, Winston Wu, Xinliang Frederick Zhang, Lu Wang
In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content.
no code implementations • EACL 2021 • Winston Wu, Dustin Arendt, Svitlana Volkova
We evaluate neural model robustness to adversarial attacks using different types of linguistic unit perturbations {--} character and word, and propose a new method for strategic sentence-level perturbations.
no code implementations • COLING 2020 • Dylan Lewis, Winston Wu, Arya D. McCarthy, David Yarowsky
We present a method for completing multilingual translation dictionaries.
no code implementations • COLING 2020 • Winston Wu, David Yarowsky
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations.
no code implementations • WS 2020 • Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post
This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).
no code implementations • LREC 2020 • Winston Wu, David Yarowsky
We developed an extensible, comprehensive Wiktionary parser that improves over several existing parsers.
no code implementations • LREC 2020 • Winston Wu, Garrett Nicolai, David Yarowsky
We propose a new functional definition and construction method for core vocabulary sets for multiple applications based on the relative coverage of a target concept in thousands of bilingual dictionaries.
no code implementations • LREC 2020 • Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky
We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance{---}the best number depends on the source language.
no code implementations • LREC 2020 • Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky
Exploiting the broad translation of the Bible into the world{'}s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology.
no code implementations • 1 May 2020 • Winston Wu, Dustin Arendt, Svitlana Volkova
We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level.
no code implementations • LREC 2020 • Winston Wu, Garrett Nicolai
We describe the JHUBC submission to the EvaLatin Shared task on lemmatization and part-of-speech tagging for Latin.
no code implementations • LREC 2020 • Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky
The corpus consists of over 4000 unique translations of the Christian Bible and counting.
1 code implementation • IJCNLP 2019 • Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky
There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969).