no code implementations • 15 May 2024 • Dylan Phelps, Thomas Pickard, Maggie Mi, Edward Gow-Smith, Aline Villavicencio
In particular, how well do such models perform in comparison to encoder-only models fine-tuned specifically for idiomaticity tasks?
no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.
1 code implementation • 16 Jun 2023 • Edward Gow-Smith, Danae Sánchez Villegas
In this paper we describe the University of Sheffield's submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages.
no code implementations • 13 Jun 2023 • Edward Gow-Smith, Alexandre Berard, Marcely Zanon Boito, Ioan Calapodescu
This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track.
no code implementations • CLTW (LREC) 2022 • Edward Gow-Smith, Mark McConville, William Gillies, Jade Scott, Roibeard Ó Maolalaigh
The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography.
no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.
1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.
1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.
1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio
Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.