no code implementations • LREC 2020 • Maria Eskevich, Franciska de Jong, Alex K{\"o}nig, er, Darja Fi{\v{s}}er, Dieter van Uytvanck, Tero Aalto, Lars Borin, Olga Gerassimenko, Jan Hajic, Henk van den Heuvel, Neeme Kahusk, Krista Liin, Martin Matthiesen, Stelios Piperidis, Kadri Vider
CLARIN is a European Research Infrastructure providing access to digital language resources and tools from across Europe and beyond to researchers in the humanities and social sciences.
no code implementations • LREC 2020 • Franciska de Jong, Bente Maegaard, Darja Fi{\v{s}}er, Dieter van Uytvanck, Andreas Witt
CLARIN is a European Research Infrastructure providing access to language resources and technologies for researchers in the humanities and social sciences.
no code implementations • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
Both datasets are published in encrypted form, to enable others to perform experiments on detecting content to be deleted without revealing potentially inappropriate content.
1 code implementation • WS 2018 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Anita Peti-Stanti{\'c}
We show that the notions of concreteness and imageability are highly predictable both within and across languages, with a moderate loss of up to 20{\%} in correlation when predicting across languages.
no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec
In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users{'} tweets.
no code implementations • WS 2017 • Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec, Nikola Ljube{\v{s}}i{\'c}
In this paper we present the legal framework, dataset and annotation schema of socially unacceptable discourse practices on social networking platforms in Slovenia.
no code implementations • WS 2017 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
We remove more than half of the error of the standard tagger when applied to non-standard texts by training it on a combination of standard and non-standard training data, while enriching the data representation with external resources removes additional 11 percent of the error.
no code implementations • WS 2016 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er
In this paper we present a series of experiments on discriminating between private and corporate accounts on Twitter.
no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Toma{\v{z}} Erjavec, Darja Fi{\v{s}}er
In computer-mediated communication, Latin-based scripts users often omit diacritics when writing.
no code implementations • LREC 2014 • Darja Fi{\v{s}}er, Ale{\v{s}} Tav{\v{c}}ar, Toma{\v{z}} Erjavec
The paper presents sloWCrowd, a simple tool developed to facilitate crowdsourcing lexicographic tasks, such as error correction in automatically generated wordnets and semantic annotation of corpora.
1 code implementation • LREC 2014 • Nikola Ljube{\v{s}}i{\'c}, Darja Fi{\v{s}}er, Toma{\v{z}} Erjavec
This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages.
no code implementations • LREC 2012 • Beno{\^\i}t Sagot, Darja Fi{\v{s}}er
Manual evaluation of the results shows that by applying a threshold similar to the estimated error rate in the respective wordnets, 67{\%} of the proposed outlier candidates are indeed incorrect for French and a 64{\%} for Slovene.
no code implementations • LREC 2012 • Darja Fi{\v{s}}er, Nikola Ljube{\v{s}}i{\'c}, Ozren Kubelka
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns.