no code implementations • COLING 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document.
no code implementations • SIGDIAL (ACL) 2021 • Aili Shen, Timothy Baldwin
Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text.
no code implementations • GWC 2018 • James Breen, Timothy Baldwin, Francis Bond
We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter.
1 code implementation • CSRR (ACL) 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau
Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.
no code implementations • ACL 2022 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Jey Han Lau
Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.
1 code implementation • COLING 2022 • Yuxia Wang, Timothy Baldwin, Karin Verspoor
Training with noisy labelled data is known to be detrimental to model performance, especially for high-capacity neural network models in low-resource domains.
no code implementations • EMNLP (NLLP) 2021 • Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin
Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.
no code implementations • EMNLP (ClinicalNLP) 2020 • Yuxia Wang, Karin Verspoor, Timothy Baldwin
Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning.
no code implementations • COLING (CODI, CRAC) 2022 • Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin
We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).
no code implementations • NAACL (ACL) 2022 • Haonan Li, Yameng Huang, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models (PLMs) have dramatically improved performance for many natural language processing (NLP) tasks in domains such as finance and healthcare.
1 code implementation • NAACL 2022 • Haonan Li, Martin Tomko, Maria Vasardani, Timothy Baldwin
Raw questions and contexts are extracted from the Natural Questions dataset.
1 code implementation • Findings (ACL) 2022 • Biaoyan Fang, Timothy Baldwin, Karin Verspoor
Procedural text contains rich anaphoric phenomena, yet has not received much attention in NLP.
no code implementations • EMNLP (NLP-COVID19) 2020 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Simon Šuster
Efficient discovery and exploration of biomedical literature has grown in importance in the context of the COVID-19 pandemic, and topic-based methods such as latent Dirichlet allocation (LDA) are a useful tool for this purpose.
no code implementations • ALTA 2020 • Kotaro Kitayama, Shivashankar Subramanian, Timothy Baldwin
Online petitions offer a mechanism for peopleto initiate a request for change and gather sup-port from others to demonstrate support for thecause.
no code implementations • ALTA 2020 • Meladel Mistica, Geordie Z. Zhang, Hui Chia, Kabir Manandhar Shrestha, Rohit Kumar Gupta, Saket Khandelwal, Jeannie Paterson, Timothy Baldwin, Daniel Beck
‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts.
1 code implementation • EMNLP (NLLP) 2021 • Wayan Oger Vihikan, Meladel Mistica, Inbar Levy, Andrew Christie, Timothy Baldwin
We introduce the new task of domain name dispute resolution (DNDR), that predicts the outcome of a process for resolving disputes about legal entitlement to a domain name.
1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko
This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.
no code implementations • Findings (EMNLP) 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
no code implementations • ECNLP (ACL) 2022 • Fajri Koto, Jey Han Lau, Timothy Baldwin
For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.
no code implementations • ALTA 2021 • Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, Timothy Baldwin
Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels.
no code implementations • Findings (EMNLP) 2021 • Anna Rogers, Timothy Baldwin, Kobi Leins
A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear.
no code implementations • 3 Apr 2024 • Thinh Hung Truong, Yulia Otmakhova, Karin Verspoor, Trevor Cohn, Timothy Baldwin
In this work, we measure the impact of affixal negation on modern English large language models (LLMs).
no code implementations • 2 Apr 2024 • Fajri Koto, Rahmad Mahendra, Nurul Aisyah, Timothy Baldwin
Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies on language models have predominantly centered on English cultures, potentially resulting in an Anglocentric bias.
no code implementations • 31 Mar 2024 • Lizhi Lin, Honglin Mu, Zenan Zhai, Minghan Wang, Yuxia Wang, Renxi Wang, Junjie Gao, Yixuan Zhang, Wanxiang Che, Timothy Baldwin, Xudong Han, Haonan Li
Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed.
no code implementations • 24 Mar 2024 • Masahiro Kaneko, Timothy Baldwin
In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data.
no code implementations • 7 Mar 2024 • Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov
Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.
no code implementations • 22 Feb 2024 • Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin
The existing evaluation metrics and methods to address these ethical challenges use datasets intentionally created by instructing humans to create instances including ethical problems.
1 code implementation • 20 Feb 2024 • Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal
In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.
1 code implementation • 20 Feb 2024 • Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin
The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models.
no code implementations • 19 Feb 2024 • Tatsuki Kuribayashi, Ryo Ueda, Ryo Yoshida, Yohei Oseki, Ted Briscoe, Timothy Baldwin
This also showcases the advantage of cognitively-motivated LMs, which are typically employed in cognitive modeling, in the computational simulation of language universals.
no code implementations • 19 Feb 2024 • Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, Jingru Zhao, Preslav Nakov, Timothy Baldwin
Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs.
1 code implementation • 18 Feb 2024 • Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin
However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents.
no code implementations • 3 Feb 2024 • Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin
Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages.
no code implementations • 28 Jan 2024 • Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki, Timothy Baldwin
In this study, we examine the impact of LLMs' step-by-step predictions on gender bias in unscalable tasks.
no code implementations • 16 Jan 2024 • Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin
Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
1 code implementation • 4 Jan 2024 • Haonan Li, Martin Tomko, Timothy Baldwin
To overcome this, we propose treating the QA task as a dense vector retrieval problem, where we encode questions and POIs separately and retrieve the most relevant POIs for a question by utilizing embedding space similarity.
1 code implementation • 17 Dec 2023 • Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin
Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks.
no code implementations • 13 Nov 2023 • Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov
Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields.
1 code implementation • 13 Nov 2023 • Tatsuki Kuribayashi, Yohei Oseki, Timothy Baldwin
In other words, pure next-word probability remains a strong predictor for human reading behavior, even in the age of LLMs.
1 code implementation • 3 Nov 2023 • Jinrui Yang, Timothy Baldwin, Trevor Cohn
We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages.
1 code implementation • 1 Nov 2023 • Takashi Wada, Timothy Baldwin, Jey Han Lau
We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models.
1 code implementation • 1 Nov 2023 • Yichen Huang, Timothy Baldwin
We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness.
no code implementations • 8 Oct 2023 • Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni
The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention.
1 code implementation • 7 Oct 2023 • Fajri Koto, Nurul Aisyah, Haonan Li, Timothy Baldwin
In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia.
1 code implementation • 15 Sep 2023 • Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych
Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground.
no code implementations • 14 Sep 2023 • Gisela Vallejo, Timothy Baldwin, Lea Frermann
The manifestation and effect of bias in news reporting have been central topics in the social sciences for decades, and have received increasing attention in the NLP community recently.
no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing
We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.
1 code implementation • 25 Aug 2023 • Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, Timothy Baldwin
With the rapid evolution of large language models (LLMs), new and hard-to-predict harmful capabilities are emerging.
1 code implementation • 8 Aug 2023 • Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, Karin Verspoor
Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as the gold standard.
1 code implementation • 15 Jun 2023 • Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin
As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.
1 code implementation • 14 Jun 2023 • Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn
Negation has been shown to be a major bottleneck for masked language models, such as BERT.
1 code implementation • 2 Jun 2023 • Takashi Wada, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau
We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context.
1 code implementation • 24 May 2023 • Haonan Li, Fajri Koto, Minghao Wu, Alham Fikri Aji, Timothy Baldwin
However, research on multilingual instruction tuning has been limited due to the scarcity of high-quality instruction-response datasets across different languages.
1 code implementation • 11 Feb 2023 • Xudong Han, Timothy Baldwin, Trevor Cohn
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.
1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 17 Oct 2022 • Xudong Han, Aili Shen, Trevor Cohn, Timothy Baldwin, Lea Frermann
Mitigating bias in training on biased datasets is an important open problem.
1 code implementation • 6 Oct 2022 • Thinh Hung Truong, Yulia Otmakhova, Timothy Baldwin, Trevor Cohn, Jey Han Lau, Karin Verspoor
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
2 code implementations • sdp (COLING) 2022 • Yulia Otmakhova, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor, Jey Han Lau
In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task.
1 code implementation • COLING 2022 • Takashi Wada, Timothy Baldwin, Yuji Matsumoto, Jey Han Lau
We propose a new unsupervised method for lexical substitution using pre-trained language models.
2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder
In this work, we focus on developing resources for languages in Indonesia.
no code implementations • NAACL 2022 • Thinh Hung Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor
Negation is a common linguistic feature that is crucial in many language understanding tasks, yet it remains a hard problem due to diversity in its expression in different types of text.
1 code implementation • NAACL 2022 • Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann
Real-world datasets often encode stereotypes and societal biases.
2 code implementations • 4 May 2022 • Xudong Han, Aili Shen, Yitong Li, Lea Frermann, Timothy Baldwin, Trevor Cohn
This paper presents fairlib, an open-source framework for assessing and improving classification fairness.
no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.
1 code implementation • 12 Mar 2022 • Xudong Han, Timothy Baldwin, Trevor Cohn
Adversarial training is a common approach for bias mitigation in natural language processing.
no code implementations • 16 Feb 2022 • Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor
This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.
no code implementations • 22 Sep 2021 • Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann
Trained classification models can unintentionally lead to biased representations and predictions, which can reinforce societal preconceptions and stereotypes.
no code implementations • EMNLP 2021 • Shivashankar Subramanian, Xudong Han, Timothy Baldwin, Trevor Cohn, Lea Frermann
Bias is pervasive in NLP models, motivating the development of automatic debiasing techniques.
no code implementations • EMNLP 2021 • Shivashankar Subramanian, Afshin Rahimi, Timothy Baldwin, Trevor Cohn, Lea Frermann
Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups.
no code implementations • 16 Sep 2021 • Xudong Han, Timothy Baldwin, Trevor Cohn
Group bias in natural language processing tasks manifests as disparities in system error rates across texts authorized by different demographic groups, typically disadvantaging minority groups.
no code implementations • 14 Sep 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
1 code implementation • EMNLP 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.
no code implementations • 30 Jul 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
There is unison is the scientific community about human induced climate change.
1 code implementation • Findings (ACL) 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).
no code implementations • NAACL 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.
no code implementations • 25 May 2021 • Simon Šuster, Karin Verspoor, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez, Yulia Otmakhova
The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature.
1 code implementation • NAACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.
1 code implementation • EACL 2021 • Biaoyan Fang, Christian Druckenbrodt, Saber A Akhondi, Jiayuan He, Timothy Baldwin, Karin Verspoor
Chemical patents contain rich coreference and bridging links, which are the target of this research.
no code implementations • EACL 2021 • Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin
Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not.
no code implementations • 18 Mar 2021 • Aili Shen, Meladel Mistica, Bahar Salehi, Hang Li, Timothy Baldwin, Jianzhong Qi
While pretrained language models ("LM") have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear.
1 code implementation • EACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).
Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)
1 code implementation • EACL 2021 • Xudong Han, Timothy Baldwin, Trevor Cohn
Adversarial learning can learn fairer and less biased models of language than standard methods.
2 code implementations • 27 Nov 2020 • Fajri Koto, Timothy Baldwin, Jey Han Lau
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).
no code implementations • COLING 2020 • Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin
Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Fajri Koto, Jey Han Lau, Timothy Baldwin
In this paper, we introduce a large-scale Indonesian summarization dataset.
1 code implementation • COLING 2020 • Haonan Li, Maria Vasardani, Martin Tomko, Timothy Baldwin
Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources.
1 code implementation • EMNLP (MRL) 2021 • Takashi Wada, Tomoharu Iwata, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +5
no code implementations • 18 Aug 2020 • Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez
We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.
no code implementations • WS 2020 • Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, James Gilkerson
Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns.
no code implementations • WS 2020 • Yuxia Wang, Fei Liu, Karin Verspoor, Timothy Baldwin
In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain.
1 code implementation • ACL 2020 • Nitika Mathur, Timothy Baldwin, Trevor Cohn
Automatic metrics are fundamental for the development and evaluation of machine translation systems.
no code implementations • ACL 2020 • Kobi Leins, Jey Han Lau, Timothy Baldwin
We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.
1 code implementation • COLING 2020 • Afshin Rahimi, Timothy Baldwin, Karin Verspoor
We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources.
no code implementations • 30 Apr 2020 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
The world is facing the challenge of climate crisis.
1 code implementation • SEMEVAL 2017 • Preslav Nakov, Doris Hoogeveen, Lluís Màrquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, Karin Verspoor
We describe SemEval-2017 Task 3 on Community Question Answering.
1 code implementation • ALTA 2019 • Fajri Koto, Jey Han Lau, Timothy Baldwin
We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.
no code implementations • WS 2019 • Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin
In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task.
no code implementations • WS 2019 • Anirudh Joshi, Timothy Baldwin, Richard Sinnott, Cecile Paris
Argument component extraction is a challenging and complex high-level semantic extraction task.
1 code implementation • IJCNLP 2019 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin
Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability.
no code implementations • 18 Jul 2019 • Jingyuan Zhang, Timothy Baldwin
Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy.
1 code implementation • ACL 2019 • Nitika Mathur, Timothy Baldwin, Trevor Cohn
Accurate, automatic evaluation of machine translation is critical for system tuning, and evaluating progress in the field.
no code implementations • ACL 2019 • Yitong Li, Timothy Baldwin, Trevor Cohn
Supervised models of NLP rely on large collections of text which closely resemble the intended testing setting.
no code implementations • WS 2019 • N, Navnita akumar, Timothy Baldwin, Bahar Salehi
In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data.
no code implementations • SEMEVAL 2019 • Haonan Li, Minghan Wang, Timothy Baldwin, Martin Tomko, Maria Vasardani
This paper describes our submission to SemEval-2019 Task 12 on toponym resolution over scientific articles.
1 code implementation • SEMEVAL 2019 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin
We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance.
no code implementations • NAACL 2019 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn, Jason Eisner
Critical to natural language generation is the production of correctly inflected text.
no code implementations • ALTA 2019 • Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt, Camilo Thorne, Saber A. Akhondi, Timothy Baldwin, Karin Verspoor
Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration.
no code implementations • ALTA 2019 • Gaurav Arora, Afshin Rahimi, Timothy Baldwin
Catastrophic forgetting {---} whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a {``}catastrophic{''} drop in performance over the first task {---} is a hurdle in the development of better transfer learning techniques.
no code implementations • 4 Jan 2019 • Aili Shen, Bahar Salehi, Timothy Baldwin, Jianzhong Qi
The quality of a document is affected by various factors, including grammaticality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one.
no code implementations • ALTA 2018 • N, Navnita akumar, Bahar Salehi, Timothy Baldwin
In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions(``MWEs'').
no code implementations • ALTA 2018 • Nitika Mathur, Timothy Baldwin, Trevor Cohn
In this paper we show that the quality control mechanism is overly conservative, which increases the time and expense of the evaluation.
no code implementations • WS 2018 • Taro Miyazaki, Afshin Rahimi, Trevor Cohn, Timothy Baldwin
Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations.
no code implementations • WS 2018 • Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau, Timothy Baldwin
Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains.
no code implementations • EMNLP 2018 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.
no code implementations • COLING 2018 • Zhe Ye, Fang Li, Timothy Baldwin
General-purpose pre-trained word embeddings have become a mainstay of natural language processing, and more recently, methods have been proposed to encode external knowledge into word embeddings to benefit specific downstream tasks.
no code implementations • COLING 2018 • Timothy Baldwin
In this talk, I will first present recent work on domain debiasing in the context of language identification, then discuss a new line of work on language variety analysis in the form of dialect map generation.
1 code implementation • ACL 2018 • Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond
In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.
1 code implementation • ACL 2018 • Shivashankar Subramanian, Timothy Baldwin, Trevor Cohn
Online petitions are a cost-effective way for citizens to collectively engage with policy-makers in a democracy.
3 code implementations • ACL 2018 • Yitong Li, Timothy Baldwin, Trevor Cohn
Written text often provides sufficient clues to identify the author, their gender, age, and other important attributes.
1 code implementation • ACL 2018 • Fei Liu, Trevor Cohn, Timothy Baldwin
Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task.
1 code implementation • NAACL 2018 • Yitong Li, Timothy Baldwin, Trevor Cohn
Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.
no code implementations • NAACL 2018 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin
Election manifestos document the intentions, motives, and views of political parties.
1 code implementation • NAACL 2018 • Fei Liu, Trevor Cohn, Timothy Baldwin
While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) --- extraction of fine-grained opinion polarity w. r. t.
Ranked #3 on Aspect-Based Sentiment Analysis (ABSA) on Sentihood
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
1 code implementation • ACL 2018 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin
Social media user geolocation is vital to many applications such as event detection.
1 code implementation • 22 Apr 2018 • Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.
1 code implementation • IJCNLP 2017 • Fei Liu, Timothy Baldwin, Trevor Cohn
Despite successful applications across a broad range of NLP tasks, conditional random fields ("CRFs"), in particular the linear-chain variant, are only able to model local features.
no code implementations • WS 2017 • Viet Nguyen, Julian Brooke, Timothy Baldwin
In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements.
no code implementations • WS 2017 • Yitong Li, Trevor Cohn, Timothy Baldwin
This paper describes our submission to the sentiment analysis sub-task of {``}Build It, Break It: The Language Edition (BIBI){''}, on both the builder and breaker sides.
no code implementations • EMNLP 2017 • Nitika Mathur, Timothy Baldwin, Trevor Cohn
Manual data annotation is a vital component of NLP research.
1 code implementation • EMNLP 2017 • Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu
Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators.
1 code implementation • EMNLP 2017 • Afshin Rahimi, Timothy Baldwin, Trevor Cohn
We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology.
no code implementations • CONLL 2017 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topic models jointly learn topics and document-level topic distribution.
1 code implementation • ACL 2017 • Jey Han Lau, Timothy Baldwin, Trevor Cohn
Language models are typically applied at the sentence level, without access to the broader document context.
no code implementations • ACL 2017 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin
We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms.
no code implementations • EACL 2017 • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton
Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.
1 code implementation • EACL 2017 • Yitong Li, Trevor Cohn, Timothy Baldwin
Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks.
no code implementations • EACL 2017 • Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin
Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.
no code implementations • WS 2017 • Ying Xu, Jey Han Lau, Timothy Baldwin, Trevor Cohn
With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time.
1 code implementation • WS 2017 • King Chan, Julian Brooke, Timothy Baldwin
This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations.
1 code implementation • EACL 2017 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn
Derivational morphology is a fundamental and complex characteristic of language.
no code implementations • TACL 2017 • Julian Brooke, Jan {\v{S}}najder, Timothy Baldwin
We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates.
1 code implementation • COLING 2016 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin
Topics generated by topic models are typically represented as list of terms.
no code implementations • WS 2016 • Timothy Baldwin
In this talk, I will outline a range of challenges presented by multiword expressions in terms of (lexicalist) precision grammar engineering, and different strategies for accommodating those challenges, in an attempt to strike the right balance in terms of generalisation and over- and under-generation.
no code implementations • WS 2016 • Bo Han, Afshin Rahimi, Leon Derczynski, Timothy Baldwin
This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016.
no code implementations • COLING 2016 • Yvette Graham, Timothy Baldwin, Meghan Dowling, Maria Eskevich, Teresa Lynn, Lamia Tounsi
Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment.
no code implementations • COLING 2016 • Bahar Salehi, Paul Cook, Timothy Baldwin
Much previous research on multiword expressions (MWEs) has focused on the token- and type-level tasks of MWE identification and extraction, respectively.
no code implementations • EMNLP 2016 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Timothy Baldwin
In named entity recognition, we often don't have a large in-domain training corpus or a knowledge base with adequate coverage to train a model directly.
1 code implementation • EMNLP 2016 • Yitong Li, Trevor Cohn, Timothy Baldwin
Deep neural networks have achieved remarkable results across many language processing tasks, however these methods are highly sensitive to noise and adversarial attacks.
4 code implementations • WS 2016 • Jey Han Lau, Timothy Baldwin
Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.
no code implementations • LREC 2016 • Richard Fothergill, Paul Cook, Timothy Baldwin
Web corpora are often constructed automatically, and their contents are therefore often not well understood.
1 code implementation • 17 Apr 2016 • Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin
This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval 2016 Task 10.
1 code implementation • ACL 2016 • Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin
Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision.
no code implementations • IJCNLP 2015 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin
We propose a label propagation approach to geolocation prediction based on Modified Adsorption, with two enhancements:(1) the removal of "celebrity" nodes to increase location homophily and boost tractability, and (2) he incorporation of text-based geolocation priors for test users.
no code implementations • HLT 2015 • Afshin Rahimi, Duy Vu, Trevor Cohn, Timothy Baldwin
Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets.
no code implementations • 21 Apr 2015 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin
Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications.
no code implementations • TACL 2014 • Marco Lui, Jey Han Lau, Timothy Baldwin
Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.