no code implementations • NAACL (AmericasNLP) 2021 • Francis Tyers, Nick Howell
We study the performance of several popular neural part-of-speech taggers from the Universal Dependencies ecosystem on Mayan languages using a small corpus of 1435 annotated K’iche’ sentences consisting of approximately 10, 000 tokens, with encouraging results: F_1 scores 93%+ on lemmatisation, part-of-speech and morphological feature assignment.
no code implementations • 28 May 2024 • Gili Goldin, Nick Howell, Noam Ordan, Ella Rabinovich, Shuly Wintner
We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022.
2 code implementations • 14 Oct 2022 • Amir Zeldes, Nick Howell, Noam Ordan, Yifat Ben Moshe
Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima'an et al. 2001).
no code implementations • LREC 2020 • Anastasia Nikiforova, Sergey Pletenev, Daria Sinitsyna, Semen Sorokin, Anastasia Lopukhina, Nick Howell
Currently, to get a measure of the language unit predictability, a neurolinguistic experiment known as a cloze task has to be conducted on a large number of participants.
2 code implementations • LREC 2020 • Amr Keleg, Francis Tyers, Nick Howell, Tommi Pirinen
In this paper, we have developed a method for weighting a morphological analyzer built using finite state transducers in order to disambiguate its results.