1 code implementation • BioNLP (ACL) 2022 • Casimiro Pio Carrino, Joan Llop, Marc Pàmies, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Joaquín Silveira-Ocampo, Alfonso Valencia, Aitor Gonzalez-Agirre, Marta Villegas
This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1. 1B tokens and an EHR corpus of 95M tokens.
no code implementations • 30 Jun 2022 • Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol, Zoraida Callejas
However, the results in Spanish present important shortcomings, as they are either too small in comparison with other languages, or present a low quality derived from sub-optimal cleaning and deduplication.
1 code implementation • 10 Dec 2021 • Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé
In this work, we present the Large Labelled Logo Dataset (L3D), a multipurpose, hand-labelled, continuously growing dataset.
Ranked #1 on Image Classification on Large Labelled Logo Dataset (L3D) (Eval F1 metric)
1 code implementation • 31 Oct 2021 • Asier Gutiérrez-Fandiño, Miquel Noguer i Alonso, Petter Kolm, Jordi Armengol-Estapé
We introduce a new language representation model in finance called Financial Embedding Analysis of Sentiment (FinEAS).
1 code implementation • 23 Oct 2021 • Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Aitor Gonzalez-Agirre, Marta Villegas
There are many Language Models for the English language according to its worldwide relevance.
no code implementations • 16 Sep 2021 • Casimiro Pio Carrino, Jordi Armengol-Estapé, Ona de Gibert Bonet, Asier Gutiérrez-Fandiño, Aitor Gonzalez-Agirre, Martin Krallinger, Marta Villegas
We introduce CoWeSe (the Corpus Web Salud Espa\~nol), the largest Spanish biomedical corpus to date, consisting of 4. 5GB (about 750M tokens) of clean plain text.
no code implementations • 8 Sep 2021 • Casimiro Pio Carrino, Jordi Armengol-Estapé, Asier Gutiérrez-Fandiño, Joan Llop-Palao, Marc Pàmies, Aitor Gonzalez-Agirre, Marta Villegas
To the best of our knowledge, we provide the first biomedical and clinical transformer-based pretrained language models for Spanish, intending to boost native Spanish NLP applications in biomedicine.
2 code implementations • 15 Jul 2021 • Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo, Casimiro Pio Carrino, Aitor Gonzalez-Agirre, Carme Armentano-Oller, Carlos Rodriguez-Penagos, Marta Villegas
This work presents MarIA, a family of Spanish language models and associated resources made available to the industry and the research community.
1 code implementation • NeurIPS 2021 • Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, Marta Villegas
The training of neural networks is usually monitored with a validation (holdout) set to estimate the generalization of the model.
no code implementations • 25 Feb 2021 • Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Casimiro Pio Carrino, Ona de Gibert, Aitor Gonzalez-Agirre, Marta Villegas
We computed both Word and Sub-word Embeddings using FastText.
1 code implementation • NeurIPS 2021 • David Pérez-Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas
Characterizing the structural properties of neural networks is crucial yet poorly understood, and there are no well-established similarity measures between networks.
1 code implementation • 21 Dec 2020 • Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas
Email can be one of the most fruitful attack vectors of research institutions as they also contain access to all accounts and thus to all private information.
Cryptography and Security Social and Information Networks