no code implementations • 30 Nov 2021 • Georgios Balikas, Massih-Reza Amini, Marianne Clausel
However, this assumption is strong for comparable corpora that consist of documents thematically similar to an extent only, which are, in turn, the most commonly available or easy to obtain.
no code implementations • 11 Dec 2020 • Francisco Borges, Georgios Balikas, Marc Brette, Guillaume Kempf, Arvind Srikantan, Matthieu Landos, Darya Brazouskaya, Qianqian Shi
Natural Language Search (NLS) extends the capabilities of search engines that perform keyword search allowing users to issue queries in a more "natural" language.
no code implementations • 24 Oct 2019 • Georgios Balikas, Ioannis Partalas
Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space.
Cross-Lingual Document Classification Document Classification +4
no code implementations • 21 Sep 2018 • Georgios Balikas
Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem.
1 code implementation • 26 Jul 2018 • Georgios Balikas, Gaël Dias, Rumen Moraliyski, Massih-Reza Amini
Discovering whether words are semantically related and identifying the specific semantic relation that holds between them is of crucial importance for NLP as it is essential for tasks like query expansion in IR.
1 code implementation • 11 May 2018 • Georgios Balikas, Charlotte Laclau, Ievgen Redko, Massih-Reza Amini
Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature.
no code implementations • SEMEVAL 2017 • Georgios Balikas
Another instance of Logistic Regression combined with the classify-and-count approach is trained for the quantification task of Subtask E. In the official leaderboard the system is ranked \textit{5/15} in Subtask C and \textit{2/12} in Subtask E.
no code implementations • 24 Jul 2017 • Cédric Lopez, Ioannis Partalas, Georgios Balikas, Nadia Derbas, Amélie Martin, Coralie Reutenauer, Frédérique Segond, Massih-Reza Amini
We begin by demonstrating why NER for tweets is a challenging problem especially when the number of entities increases.
1 code implementation • 12 Jul 2017 • Georgios Balikas, Simon Moura, Massih-Reza Amini
Traditional sentiment analysis approaches tackle problems like ternary (3-category) and fine-grained (5-category) classification by learning the tasks separately.
1 code implementation • ACL 2017 • Hesam Amoualian, Wei Lu, Eric Gaussier, Georgios Balikas, Massih R. Amini, Marianne Clausel
This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words.
no code implementations • 3 May 2017 • Georgios Balikas, Ioannis Partalas
Word clusters have been empirically shown to offer important performance improvements on various tasks.
1 code implementation • COLING 2016 • Georgios Balikas, Hesam Amoualian, Marianne Clausel, Eric Gaussier, Massih R. Amini
The exchangeability assumption in topic models like Latent Dirichlet Allocation (LDA) often results in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent.
no code implementations • 21 Jun 2016 • Georgios Balikas, Massih-Reza Amini
We investigate the integration of word embeddings as classification features in the setting of large scale text classification.
1 code implementation • SEMEVAL 2016 • Georgios Balikas, Massih-Reza Amini
Specifically, we participated in Task 4, namely "Sentiment Analysis in Twitter" for which we implemented sentiment classification systems for subtasks A, B, C and D. Our approach consists of two steps.
no code implementations • 9 Jun 2016 • Ioannis Partalas, Georgios Balikas
This report describes our participation in the cDiscount 2015 challenge where the goal was to classify product items in a predefined taxonomy of products.
1 code implementation • 1 Jun 2016 • Georgios Balikas, Massih-Reza Amini, Marianne Clausel
Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them.