1 code implementation • 6 Jun 2022 • Iria de-Dios-Flores, Marcos Garcia
This paper explores the ability of Transformer models to capture subject-verb and noun-adjective agreement dependencies in Galician.
1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.
1 code implementation • ACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the Noun Compound Type and Token Idiomaticity (NCTTI) dataset, with human annotations for 280 noun compounds in English and 180 in Portuguese at both type and token level.
1 code implementation • ACL 2021 • Marcos Garcia
We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy.
1 code implementation • EACL 2021 • Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio
Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language.
no code implementations • 25 Mar 2021 • David Vilares, Marcos Garcia, Carlos Gómez-Rodríguez
The experiments show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.
no code implementations • WS 2019 • Marcos Garcia, Marcos Garc{\'\i}a Salido
This paper introduces a novel method to track collocational variations in diachronic corpora that can identify several changes undergone by these phraseological combinations and to propose alternative solutions found in later periods.
no code implementations • WS 2019 • Pablo Gamallo, Marcos Garcia
This article describes a dependency-based strategy that uses compositional distributional semantics and cross-lingual word embeddings to translate multiword expressions (MWEs).
no code implementations • WS 2019 • Marcos Garcia, Marcos Garc{\'\i}a Salido, Margarita Alonso-Ramos
This paper presents an exploration of different statistical association measures to automatically identify collocations from corpora in English, Portuguese, and Spanish.
no code implementations • ACL 2019 • Marcos Garcia, Marcos Garc{\'\i}a Salido, Susana Sotelo, Estela Mosqueira, Margarita Alonso-Ramos
This paper presents a new multilingual corpus with semantic annotation of collocations in English, Portuguese, and Spanish.
1 code implementation • WS 2017 • David Vilares, Marcos Garcia, Miguel A. Alonso, Carlos Gómez-Rodríguez
Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines.
no code implementations • CONLL 2017 • Marcos Garcia, Pablo Gamallo
We also compare our system with a delexicalized parser for Romance languages, and take advantage of the harmonized annotation of Universal Dependencies to propose a language ranking based on the syntactic distance each variety has from Romance languages.
no code implementations • WS 2017 • Marcos Garcia, Marcos Garc{\'\i}a-Salido, Margarita Alonso-Ramos
This paper presents a new strategy for multilingual collocation extraction which takes advantage of parallel corpora to learn bilingual word-embeddings.
no code implementations • EACL 2017 • Pablo Gamallo, Iv{\'a}n Rodr{\'\i}guez-Torres, Marcos Garcia
This article describes a semantic system which is based on distributional models obtained from a chronologically structured language resource, namely Google Books Syntactic Ngrams. The models were created using dependency-based contexts and a strategy for reducing the vector space, which consists in selecting the more informative and relevant word contexts.
no code implementations • LREC 2016 • Marcos Garcia
This paper explores the incorporation of lexico-semantic heuristics into a deterministic Coreference Resolution (CR) system for classifying named entities at document-level.
no code implementations • LREC 2014 • Marcos Garcia, Pablo Gamallo
This paper presents three corpora with coreferential annotation of person entities for Portuguese, Galician and Spanish.