no code implementations • NoDaLiDa 2021 • Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann
Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.
no code implementations • WS (NoDaLiDa) 2019 • Mikko Aulamo, Jörg Tiedemann
This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services.
no code implementations • 20 Mar 2024 • Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann
We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.
2 code implementations • 24 Nov 2023 • Nikolay Bogoychev, Jelmer Van der Linde, Graeme Nail, Barry Haddow, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Lukas Weymann, Tudor Nicolae Mateiu, Jindřich Helcl, Mikko Aulamo
Developing high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field.
no code implementations • 4 Dec 2022 • Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja
This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.
no code implementations • WS 2020 • Ra{\'u}l V{\'a}zquez, Mikko Aulamo, Umut Sulubacak, J{\"o}rg Tiedemann
This paper describes the University of Helsinki Language Technology group{'}s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text.
no code implementations • ACL 2020 • Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann
We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.
no code implementations • LREC 2020 • J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula
This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.
no code implementations • LREC 2020 • Mikko Aulamo, Umut Sulubacak, Sami Virpioja, J{\"o}rg Tiedemann
We show the use of these tools in parallel corpus creation and data diagnostics.
no code implementations • WS 2018 • Eetu Sjöblom, Mathias Creutz, Mikko Aulamo
We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish.