Search Results for author: Miryam de Lhoneux

Found 32 papers, 16 papers with code

Linguistic Annotation of Neo-Latin Mathematical Texts: A Pilot-Study to Improve the Automatic Parsing of the Archimedes Latinus

no code implementations • LT4HALA (LREC) 2022 • Margherita Fantoli, Miryam de Lhoneux

This paper describes the process of syntactically parsing the Latin translation by Jacopo da San Cassiano of the Greek mathematical work The Spirals of Archimedes.

Translation

Paper
Add Code

A Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs

1 code implementation • CoNLL (EMNLP) 2021 • Mareike Hartmann, Miryam de Lhoneux, Daniel Hershcovich, Yova Kementchedjhieva, Lukas Nielsen, Chen Qiu, Anders Søgaard

Negation is one of the most fundamental concepts in human cognition and language, and several natural language inference (NLI) probes have been designed to investigate pretrained language models’ ability to detect and reason with negation.

Natural Language Inference Negation

Paper
Code

Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task

no code implementations • NAACL (AmericasNLP) 2021 • Marcel Bollmann, Rahul Aralikatte, Héctor Murrieta Bello, Daniel Hershcovich, Miryam de Lhoneux, Anders Søgaard

We evaluated a range of neural machine translation techniques developed specifically for low-resource scenarios.

Machine Translation Translation

Paper
Add Code

How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task

no code implementations • ACL (WAT) 2021 • Rahul Aralikatte, Héctor Ricardo Murrieta Bello, Miryam de Lhoneux, Daniel Hershcovich, Marcel Bollmann, Anders Søgaard

This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization.

Translation

Paper
Add Code

What is 'Typological Diversity' in NLP?

1 code implementation • 6 Feb 2024 • Esther Ploeger, Wessel Poelman, Miryam de Lhoneux, Johannes Bjerva

We recommend future work to include an operationalization of 'typological diversity' that empirically justifies the diversity of language samples.

Multilingual NLP

Paper
Code

Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification

no code implementations • 5 Feb 2024 • Kushal Tatariya, Heather Lent, Johannes Bjerva, Miryam de Lhoneux

Emotion classification is a challenging task in NLP due to the inherent idiosyncratic and subjective nature of linguistic expression, especially with code-mixed data.

Emotion Classification

Paper
Add Code

CreoleVal: Multilingual Multitask Benchmarks for Creoles

1 code implementation • 30 Oct 2023 • Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data.

Machine Translation Reading Comprehension +2

Paper
Code

A Two-Sided Discussion of Preregistration of NLP Research

no code implementations • 20 Feb 2023 • Anders Søgaard, Daniel Hershcovich, Miryam de Lhoneux

Van Miltenburg et al. (2021) suggest NLP research should adopt preregistration to prevent fishing expeditions and to promote publication of negative results.

Vocal Bursts Valence Prediction

Paper
Add Code

Language Modelling with Pixels

1 code implementation • 14 Jul 2022 • Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott

We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts.

Ranked #1 on Named Entity Recognition (NER) on MasakhaNER

Language Modelling Named Entity Recognition (NER)

322

Paper
Code

What a Creole Wants, What a Creole Needs

no code implementations • LREC 2022 • Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard

We demonstrate, through conversations with Creole experts and surveys of Creole-speaking communities, how the things needed from language technology can change dramatically from one language to another, even when the languages are considered to be very similar to each other, as with Creoles.

Paper
Add Code

Challenges and Strategies in Cross-Cultural NLP

no code implementations • ACL 2022 • Daniel Hershcovich, Stella Frank, Heather Lent, Miryam de Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard

Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages.

Cultural Vocal Bursts Intensity Prediction Multilingual NLP

Paper
Add Code

Finding Structural Knowledge in Multimodal-BERT

1 code implementation • ACL 2022 • Victor Milewski, Miryam de Lhoneux, Marie-Francine Moens

In this work, we investigate the knowledge learned in the embeddings of multimodal-BERT models.

Paper
Code

Zero-Shot Dependency Parsing with Worst-Case Aware Automated Curriculum Learning

1 code implementation • ACL 2022 • Miryam de Lhoneux, Sheng Zhang, Anders Søgaard

Large multilingual pretrained language models such as mBERT and XLM-RoBERTa have been found to be surprisingly effective for cross-lingual transfer of syntactic parsing models (Wu and Dredze 2019), but only between related languages.

Cross-Lingual Transfer Dependency Parsing +1

Paper
Code

Parsing with Pretrained Language Models, Multiple Datasets, and Dataset Embeddings

1 code implementation • ACL (TLT, SyntaxFest) 2021 • Rob van der Goot, Miryam de Lhoneux

With an increase of dataset availability, the potential for learning from a variety of data sources has increased.

Paper
Code

On Language Models for Creoles

1 code implementation • CoNLL (EMNLP) 2021 • Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders Søgaard

Creole languages such as Nigerian Pidgin English and Haitian Creole are under-resourced and largely ignored in the NLP literature.

Paper
Code

Itihasa: A large-scale corpus for Sanskrit to English translation

no code implementations • ACL (WAT) 2021 • Rahul Aralikatte, Miryam de Lhoneux, Anoop Kunchukuttan, Anders Søgaard

This work introduces Itihasa, a large-scale translation dataset containing 93, 000 pairs of Sanskrit shlokas and their English translations.

Ranked #1 on Machine Translation on Itihasa

Machine Translation Translation

Paper
Add Code

Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics

2 code implementations • COLING 2020 • Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, Omri Abend

Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other.

Natural Language Understanding Sentence

Paper
Code

K\opsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding

no code implementations • WS 2020 • Daniel Hershcovich, Miryam de Lhoneux, Artur Kulmizev, Elham Pejhan, Joakim Nivre

We present K{\o}psala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020.

Sentence

Paper
Add Code

Køpsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding

1 code implementation • 25 May 2020 • Daniel Hershcovich, Miryam de Lhoneux, Artur Kulmizev, Elham Pejhan, Joakim Nivre

We present K{\o}psala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020.

Sentence

Paper
Code

Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited

no code implementations • IJCNLP 2019 • Artur Kulmizev, Miryam de Lhoneux, Johannes Gontrum, Elena Fano, Joakim Nivre

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope.

Dependency Parsing Sentence +1

Paper
Add Code

Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing -- A Tale of Two Parsers Revisited

no code implementations • 20 Aug 2019 • Artur Kulmizev, Miryam de Lhoneux, Johannes Gontrum, Elena Fano, Joakim Nivre

Dependency Parsing Sentence +1

Paper
Add Code

What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?

1 code implementation • CL (ACL) 2020 • Miryam de Lhoneux, Sara Stymne, Joakim Nivre

We find that the parser learns different information about AVCs and FMVs if only sequential models (BiLSTMs) are used in the architecture but similar information when a recursive layer is used.

Dependency Parsing Open-Ended Question Answering

Paper
Code

Recursive Subtree Composition in LSTM-Based Dependency Parsing

1 code implementation • NAACL 2019 • Miryam de Lhoneux, Miguel Ballesteros, Joakim Nivre

When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information.

Dependency Parsing

Paper
Code

82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

no code implementations • CONLL 2018 • Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, Sara Stymne

We present the Uppsala system for the CoNLL 2018 Shared Task on universal dependency parsing.

Dependency Parsing POS +4

Paper
Add Code

Nightmare at test time: How punctuation prevents parsers from generalizing

no code implementations • WS 2018 • Anders Søgaard, Miryam de Lhoneux, Isabelle Augenstein

Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal.

Paper
Add Code

Parameter sharing between dependency parsers for related languages

1 code implementation • EMNLP 2018 • Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard

We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies.

Paper
Code

An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing

no code implementations • EMNLP 2018 • Aaron Smith, Miryam de Lhoneux, Sara Stymne, Joakim Nivre

We provide a comprehensive analysis of the interactions between pre-trained word embeddings, character models and POS tags in a transition-based dependency parser.

Dependency Parsing POS +2

Paper
Add Code

Parser Training with Heterogeneous Treebanks

1 code implementation • ACL 2018 • Sara Stymne, Miryam de Lhoneux, Aaron Smith, Joakim Nivre

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question.

Open-Ended Question Answering

Paper
Code

Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle

1 code implementation • WS 2017 • Miryam de Lhoneux, Sara Stymne, Joakim Nivre

In this paper, we extend the arc-hybrid system for transition-based parsing with a swap transition that enables reordering of the words and construction of non-projective trees.

Dependency Parsing

Paper
Code

From Raw Text to Universal Dependencies - Look, No Tags!

no code implementations • CONLL 2017 • Miryam de Lhoneux, Yan Shao, Ali Basirat, Eliyahu Kiperwasser, Sara Stymne, Yoav Goldberg, Joakim Nivre

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies.

Dependency Parsing Part-Of-Speech Tagging +5

Paper
Add Code

Should Have, Would Have, Could Have. Investigating Verb Group Representations for Parsing with Universal Dependencies.

1 code implementation • WS 2016 • Miryam de Lhoneux, Joakim Nivre

Constituency Parsing Dependency Parsing

Paper
Code

CCG Parsing and Multiword Expressions

no code implementations • 17 May 2015 • Miryam de Lhoneux

We show that the baseline model performs significantly better on our gold standard when the data are collapsed before parsing than when the data are collapsed after parsing which indicates that there is a parsing effect.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.