Search Results for author: Timothy Baldwin

Found 221 papers, 75 papers with code

LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization

no code implementations • COLING 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau

Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document.

Abstractive Text Summarization Document Summarization

Paper
Add Code

A Simple yet Effective Method for Sentence Ordering

no code implementations • SIGDIAL (ACL) 2021 • Aili Shen, Timothy Baldwin

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text.

Sentence Sentence Ordering

Paper
Add Code

The Company They Keep: Extracting Japanese Neologisms Using Language Patterns

no code implementations • GWC 2018 • James Breen, Timothy Baldwin, Francis Bond

We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter.

Paper
Add Code

Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian

1 code implementation • CSRR (ACL) 2022 • Fajri Koto, Timothy Baldwin, Jey Han Lau

Story comprehension that involves complex causal and temporal relations is a critical task in NLP, but previous studies have focused predominantly on English, leaving open the question of how the findings generalize to other languages, such as Indonesian.

Cloze Test Sentence +1

Paper
Code

The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature

no code implementations • ACL 2022 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Jey Han Lau

Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency.

Paper
Add Code

Noisy Label Regularisation for Textual Regression

1 code implementation • COLING 2022 • Yuxia Wang, Timothy Baldwin, Karin Verspoor

Training with noisy labelled data is known to be detrimental to model performance, especially for high-capacity neural network models in low-resource domains.

regression

Paper
Code

Semi-automatic Triage of Requests for Free Legal Assistance

no code implementations • EMNLP (NLLP) 2021 • Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.

Fairness

Paper
Add Code

Learning from Unlabelled Data for Clinical Semantic Textual Similarity

no code implementations • EMNLP (ClinicalNLP) 2020 • Yuxia Wang, Karin Verspoor, Timothy Baldwin

Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning.

Semantic Textual Similarity Sentence +1

Paper
Add Code

Easy-First Bottom-Up Discourse Parsing via Sequence Labelling

no code implementations • COLING (CODI, CRAC) 2022 • Andrew Shen, Fajri Koto, Jey Han Lau, Timothy Baldwin

We propose a novel unconstrained bottom-up approach for rhetorical discourse parsing based on sequence labelling of adjacent pairs of discourse units (DUs), based on the framework of Koto et al. (2021).

Discourse Parsing

Paper
Add Code

CULG: Commercial Universal Language Generation

no code implementations • NAACL (ACL) 2022 • Haonan Li, Yameng Huang, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models (PLMs) have dramatically improved performance for many natural language processing (NLP) tasks in domains such as finance and healthcare.

Marketing Text Generation

Paper
Add Code

MultiSpanQA: A Dataset for Multi-Span Question Answering

1 code implementation • NAACL 2022 • Haonan Li, Martin Tomko, Maria Vasardani, Timothy Baldwin

Raw questions and contexts are extracted from the Natural Questions dataset.

Natural Questions Question Answering +1

Paper
Code

What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text

1 code implementation • Findings (ACL) 2022 • Biaoyan Fang, Timothy Baldwin, Karin Verspoor

Procedural text contains rich anaphoric phenomena, yet has not received much attention in NLP.

Transfer Learning

Paper
Code

Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration

no code implementations • EMNLP (NLP-COVID19) 2020 • Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Simon Šuster

Efficient discovery and exploration of biomedical literature has grown in importance in the context of the COVID-19 pandemic, and topic-based methods such as latent Dirichlet allocation (LDA) are a useful tool for this purpose.

Specificity Topic Models

Paper
Add Code

Popularity Prediction of Online Petitions using a Multimodal DeepRegression Model

no code implementations • ALTA 2020 • Kotaro Kitayama, Shivashankar Subramanian, Timothy Baldwin

Online petitions offer a mechanism for peopleto initiate a request for change and gather sup-port from others to demonstrate support for thecause.

Paper
Add Code

Information Extraction from Legal Documents: A Study in the Context of Common Law Court Judgements

no code implementations • ALTA 2020 • Meladel Mistica, Geordie Z. Zhang, Hui Chia, Kabir Manandhar Shrestha, Rohit Kumar Gupta, Saket Khandelwal, Jeannie Paterson, Timothy Baldwin, Daniel Beck

‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts.

text-classification Text Classification

Paper
Add Code

Decoupling Adversarial Training for Fair NLP

1 code implementation • Findings (ACL) 2021 • Xudong Han, Timothy Baldwin, Trevor Cohn

Paper
Code

Automatic Resolution of Domain Name Disputes

1 code implementation • EMNLP (NLLP) 2021 • Wayan Oger Vihikan, Meladel Mistica, Inbar Levy, Andrew Christie, Timothy Baldwin

We introduce the new task of domain name dispute resolution (DNDR), that predicts the outcome of a process for resolving disputes about legal entitlement to a domain name.

Benchmarking

Paper
Code

MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

1 code implementation • EMNLP (WNUT) 2021 • Rob van der Goot, Alan Ramponi, Arkaitz Zubiaga, Barbara Plank, Benjamin Muller, Iñaki San Vicente Roncal, Nikola Ljubešić, Özlem Çetinoğlu, Rahmad Mahendra, Talha Çolakoğlu, Timothy Baldwin, Tommaso Caselli, Wladimir Sidorenko

This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation.

Dependency Parsing Lexical Normalization +2

Paper
Code

KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning

no code implementations • Findings (EMNLP) 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.

Contrastive Learning Decoder +1

Paper
Add Code

Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?

no code implementations • ECNLP (ACL) 2022 • Fajri Koto, Jey Han Lau, Timothy Baldwin

For any e-commerce service, persuasive, faithful, and informative product descriptions can attract shoppers and improve sales.

Text Generation

Paper
Add Code

Evaluating Hierarchical Document Categorisation

no code implementations • ALTA 2021 • Qian Sun, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, Timothy Baldwin

Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels.

Paper
Add Code

‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP

no code implementations • Findings (EMNLP) 2021 • Anna Rogers, Timothy Baldwin, Kobi Leins

A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear.

Ethics Position

Paper
Add Code

Revisiting subword tokenization: A case study on affixal negation in large language models

no code implementations • 3 Apr 2024 • Thinh Hung Truong, Yulia Otmakhova, Karin Verspoor, Trevor Cohn, Timothy Baldwin

In this work, we measure the impact of affixal negation on modern English large language models (LLMs).

Negation Negation Detection

Paper
Add Code

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces

no code implementations • 2 Apr 2024 • Fajri Koto, Rahmad Mahendra, Nurul Aisyah, Timothy Baldwin

Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies on language models have predominantly centered on English cultures, potentially resulting in an Anglocentric bias.

Language Modelling

Paper
Add Code

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

no code implementations • 31 Mar 2024 • Lizhi Lin, Honglin Mu, Zenan Zhai, Minghan Wang, Yuxia Wang, Renxi Wang, Junjie Gao, Yixuan Zhang, Wanxiang Che, Timothy Baldwin, Xudong Han, Haonan Li

Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safety issues as various vulnerabilities are exposed.

Paper
Add Code

A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish

no code implementations • 24 Mar 2024 • Masahiro Kaneko, Timothy Baldwin

In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data.

Few-Shot Learning

Paper
Add Code

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

no code implementations • 7 Mar 2024 • Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.

Fact Checking Hallucination +1

Paper
Add Code

Eagle: Ethical Dataset Given from Real Interactions

no code implementations • 22 Feb 2024 • Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin

The existing evaluation metrics and methods to address these ethical challenges use datasets intentionally created by instructing humans to create instances including ethical problems.

Paper
Add Code

BiMediX: Bilingual Medical Mixture of Experts LLM

1 code implementation • 20 Feb 2024 • Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.

Multiple-choice Open-Ended Question Answering

Paper
Code

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

1 code implementation • 20 Feb 2024 • Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin

The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models.

Language Modelling Multiple-choice +1

Paper
Code

Emergent Word Order Universals from Cognitively-Motivated Language Models

no code implementations • 19 Feb 2024 • Tatsuki Kuribayashi, Ryo Ueda, Ryo Yoshida, Yohei Oseki, Ted Briscoe, Timothy Baldwin

This also showcases the advantage of cognitively-motivated LMs, which are typically employed in cognitive modeling, in the computational simulation of language universals.

Paper
Add Code

A Chinese Dataset for Evaluating the Safeguards in Large Language Models

no code implementations • 19 Feb 2024 • Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, Jingru Zhao, Preslav Nakov, Timothy Baldwin

Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs.

Paper
Add Code

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents

1 code implementation • 18 Feb 2024 • Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin

However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents.

Mathematical Reasoning Multi-hop Question Answering +2

Paper
Code

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

no code implementations • 3 Feb 2024 • Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, Timothy Baldwin

Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages.

Sentence Sentiment Analysis

Paper
Add Code

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting

no code implementations • 28 Jan 2024 • Masahiro Kaneko, Danushka Bollegala, Naoaki Okazaki, Timothy Baldwin

In this study, we examine the impact of LLMs' step-by-step predictions on gender bias in unscalable tasks.

Arithmetic Reasoning Fact Checking +1

Paper
Add Code

The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing

no code implementations • 16 Jan 2024 • Masahiro Kaneko, Danushka Bollegala, Timothy Baldwin

Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.

In-Context Learning

Paper
Add Code

Location Aware Modular Biencoder for Tourism Question Answering

1 code implementation • 4 Jan 2024 • Haonan Li, Martin Tomko, Timothy Baldwin

To overcome this, we propose treating the QA task as a dense vector retrieval problem, where we encode questions and POIs separately and retrieve the most relevant POIs for a question by utilizing embedding space similarity.

Question Answering Retrieval

Paper
Code

Demystifying Instruction Mixing for Fine-tuning Large Language Models

1 code implementation • 17 Dec 2023 • Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin

Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks.

Language Modelling Large Language Model

Paper
Code

LM-Polygraph: Uncertainty Estimation for Language Models

no code implementations • 13 Nov 2023 • Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields.

Text Generation

Paper
Add Code

Psychometric Predictive Power of Large Language Models

1 code implementation • 13 Nov 2023 • Tatsuki Kuribayashi, Yohei Oseki, Timothy Baldwin

In other words, pure next-word probability remains a strong predictor for human reading behavior, even in the age of LLMs.

Paper
Code

Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

1 code implementation • 3 Nov 2023 • Jinrui Yang, Timothy Baldwin, Trevor Cohn

We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages.

Ranked #1 on Text Reranking on Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Benchmarking Fairness +3

Paper
Code

Unsupervised Lexical Simplification with Context Augmentation

1 code implementation • 1 Nov 2023 • Takashi Wada, Timothy Baldwin, Jey Han Lau

We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models.

Lexical Simplification

Paper
Code

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

1 code implementation • 1 Nov 2023 • Yichen Huang, Timothy Baldwin

We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness.

Machine Translation Sentence +1

Paper
Code

Factuality Challenges in the Era of Large Language Models

no code implementations • 8 Oct 2023 • Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni

The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention.

Text Generation

Paper
Add Code

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU

1 code implementation • 7 Oct 2023 • Fajri Koto, Nurul Aisyah, Haonan Li, Timothy Baldwin

In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia.

Multi-task Language Understanding World Knowledge

Paper
Code

Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

1 code implementation • 15 Sep 2023 • Chen Cecilia Liu, Fajri Koto, Timothy Baldwin, Iryna Gurevych

Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground.

Question Answering

Paper
Code

Connecting the Dots in News Analysis: A Cross-Disciplinary Survey of Media Bias and Framing

no code implementations • 14 Sep 2023 • Gisela Vallejo, Timothy Baldwin, Lea Frermann

The manifestation and effect of bias in news reporting have been central topics in the social sciences for decades, and have received increasing attention in the NLP community recently.

Paper
Add Code

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing

We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.

Decoder

Paper
Add Code

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

1 code implementation • 25 Aug 2023 • Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, Timothy Baldwin

With the rapid evolution of large language models (LLMs), new and hard-to-predict harmful capabilities are emerging.

Paper
Code

Collective Human Opinions in Semantic Textual Similarity

1 code implementation • 8 Aug 2023 • Yuxia Wang, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, Karin Verspoor

Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as the gold standard.

Semantic Textual Similarity Sentence +1

Paper
Code

CMMLU: Measuring massive multitask language understanding in Chinese

1 code implementation • 15 Jun 2023 • Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.

Large Language Model

576

Paper
Code

Language models are not naysayers: An analysis of language models on negation benchmarks

1 code implementation • 14 Jun 2023 • Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn

Negation has been shown to be a major bottleneck for masked language models, such as BERT.

Negation

Paper
Code

Unsupervised Paraphrasing of Multiword Expressions

1 code implementation • 2 Jun 2023 • Takashi Wada, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau

We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context.

text similarity

Paper
Code

Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation

1 code implementation • 24 May 2023 • Haonan Li, Fajri Koto, Minghao Wu, Alham Fikri Aji, Timothy Baldwin

However, research on multilingual instruction tuning has been limited due to the scarcity of high-quality instruction-response datasets across different languages.

Instruction Following

Paper
Code

Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP

1 code implementation • 11 Feb 2023 • Xudong Han, Timothy Baldwin, Trevor Cohn

Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.

Fairness Model Selection +1

Paper
Code

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

252

Paper
Code

Systematic Evaluation of Predictive Fairness

1 code implementation • 17 Oct 2022 • Xudong Han, Aili Shen, Trevor Cohn, Timothy Baldwin, Lea Frermann

Mitigating bias in training on biased datasets is an important open problem.

Binary Classification Classification +3

Paper
Code

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

1 code implementation • 6 Oct 2022 • Thinh Hung Truong, Yulia Otmakhova, Timothy Baldwin, Trevor Cohn, Jey Han Lau, Karin Verspoor

Negation is poorly captured by current language models, although the extent of this problem is not widely understood.

Natural Language Inference Negation

Paper
Code

LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation

2 code implementations • sdp (COLING) 2022 • Yulia Otmakhova, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor, Jey Han Lau

In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task.

Paper
Code

Unsupervised Lexical Substitution with Decontextualised Embeddings

1 code implementation • COLING 2022 • Takashi Wada, Timothy Baldwin, Yuji Matsumoto, Jey Han Lau

We propose a new unsupervised method for lexical substitution using pre-trained language models.

Word Embeddings

Paper
Code

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

In this work, we focus on developing resources for languages in Indonesia.

Machine Translation Translation

Paper
Code

Improving negation detection with negation-focused pre-training

no code implementations • NAACL 2022 • Thinh Hung Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor

Negation is a common linguistic feature that is crucial in many language understanding tasks, yet it remains a hard problem due to diversity in its expression in different types of text.

Data Augmentation Negation +1

Paper
Add Code

Optimising Equal Opportunity Fairness in Model Training

1 code implementation • NAACL 2022 • Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann

Real-world datasets often encode stereotypes and societal biases.

Fairness

Paper
Code

fairlib: A Unified Framework for Assessing and Improving Classification Fairness

2 code implementations • 4 May 2022 • Xudong Han, Aili Shen, Yitong Li, Lea Frermann, Timothy Baldwin, Trevor Cohn

This paper presents fairlib, an open-source framework for assessing and improving classification fairness.

Classification Fairness

Paper
Code

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder

NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.

Paper
Add Code

Towards Equal Opportunity Fairness through Adversarial Learning

1 code implementation • 12 Mar 2022 • Xudong Han, Timothy Baldwin, Trevor Cohn

Adversarial training is a common approach for bias mitigation in natural language processing.

Fairness

Paper
Code

ITTC @ TREC 2021 Clinical Trials Track

no code implementations • 16 Feb 2022 • Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor

This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track.

Retrieval

Paper
Add Code

Contrastive Learning for Fair Representations

no code implementations • 22 Sep 2021 • Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann

Trained classification models can unintentionally lead to biased representations and predictions, which can reinforce societal preconceptions and stereotypes.

Attribute Contrastive Learning

Paper
Add Code

Evaluating Debiasing Techniques for Intersectional Biases

no code implementations • EMNLP 2021 • Shivashankar Subramanian, Xudong Han, Timothy Baldwin, Trevor Cohn, Lea Frermann

Bias is pervasive in NLP models, motivating the development of automatic debiasing techniques.

Paper
Add Code

Fairness-aware Class Imbalanced Learning

no code implementations • EMNLP 2021 • Shivashankar Subramanian, Afshin Rahimi, Timothy Baldwin, Trevor Cohn, Lea Frermann

Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups.

Fairness Long-tail Learning

Paper
Add Code

Balancing out Bias: Achieving Fairness Through Balanced Training

no code implementations • 16 Sep 2021 • Xudong Han, Timothy Baldwin, Trevor Cohn

Group bias in natural language processing tasks manifests as disparities in system error rates across texts authorized by different demographic groups, typically disadvantaging minority groups.

Fairness

Paper
Add Code

KFCNet: Knowledge Filtering and Contrastive Learning Network for Generative Commonsense Reasoning

no code implementations • 14 Sep 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Contrastive Learning Decoder +1

Paper
Add Code

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation • EMNLP 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Paper
Code

Automatic Claim Review for Climate Science via Explanation Generation

no code implementations • 30 Jul 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

There is unison is the scientific community about human induced climate change.

Decoder Explanation Generation +2

Paper
Add Code

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation • Findings (ACL) 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

Paper
Code

Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism

no code implementations • NAACL 2021 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.

Paper
Add Code

Impact of detecting clinical trial elements in exploration of COVID-19 literature

no code implementations • 25 May 2021 • Simon Šuster, Karin Verspoor, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez, Yulia Otmakhova

The COVID-19 pandemic has driven ever-greater demand for tools which enable efficient exploration of biomedical literature.

Efficient Exploration PICO +1

Paper
Add Code

Discourse Probing of Pretrained Language Models

1 code implementation • NAACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

Sentence

Paper
Code

ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain

1 code implementation • EACL 2021 • Biaoyan Fang, Christian Druckenbrodt, Saber A Akhondi, Jiayuan He, Timothy Baldwin, Karin Verspoor

Chemical patents contain rich coreference and bridging links, which are the target of this research.

Paper
Code

On the (In)Effectiveness of Images for Text Classification

no code implementations • EACL 2021 • Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin

Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not.

text-classification Text Classification

Paper
Add Code

Evaluating Document Coherence Modelling

no code implementations • 18 Mar 2021 • Aili Shen, Meladel Mistica, Bahar Salehi, Hang Li, Timothy Baldwin, Jianzhong Qi

While pretrained language models ("LM") have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear.

Intrusion Detection Sentence

Paper
Add Code

Top-down Discourse Parsing via Sequence Labelling

1 code implementation • EACL 2021 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Ranked #7 on Discourse Parsing on RST-DT (Standard Parseval (Span) metric)

Decoder Discourse Parsing

Paper
Code

Diverse Adversaries for Mitigating Bias in Training

1 code implementation • EACL 2021 • Xudong Han, Timothy Baldwin, Trevor Cohn

Adversarial learning can learn fairer and less biased models of language than standard methods.

Paper
Code

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations • 27 Nov 2020 • Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering Semantic Textual Similarity +2

Paper
Code

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations • COLING 2020 • Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Benchmarking Language Modelling

Paper
Add Code

Liputan6: A Large-scale Indonesian Dataset for Text Summarization

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Fajri Koto, Jey Han Lau, Timothy Baldwin

In this paper, we introduce a large-scale Indonesian summarization dataset.

Abstractive Text Summarization

Paper
Code

Target Word Masking for Location Metonymy Resolution

1 code implementation • COLING 2020 • Haonan Li, Maria Vasardani, Martin Tomko, Timothy Baldwin

Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources.

Paper
Code

Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora

1 code implementation • EMNLP (MRL) 2021 • Takashi Wada, Tomoharu Iwata, Yuji Matsumoto, Timothy Baldwin, Jey Han Lau

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +5

Paper
Code

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

no code implementations • 18 Aug 2020 • Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez

We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.

Paper
Add Code

Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes

no code implementations • WS 2020 • Brian Hur, Timothy Baldwin, Karin Verspoor, Laura Hardefeldt, James Gilkerson

Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns.

Document Classification Domain Adaptation +1

Paper
Add Code

Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity

no code implementations • WS 2020 • Yuxia Wang, Fei Liu, Karin Verspoor, Timothy Baldwin

In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain.

Data Augmentation Semantic Textual Similarity +1

Paper
Add Code

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

1 code implementation • ACL 2020 • Nitika Mathur, Timothy Baldwin, Trevor Cohn

Automatic metrics are fundamental for the development and evaluation of machine translation systems.

Machine Translation Translation

Paper
Code

Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

no code implementations • ACL 2020 • Kobi Leins, Jey Han Lau, Timothy Baldwin

We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.

Paper
Add Code

WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking

1 code implementation • COLING 2020 • Afshin Rahimi, Timothy Baldwin, Karin Verspoor

We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources.

Paper
Code

You are right. I am ALARMED -- But by Climate Change Counter Movement

no code implementations • 30 Apr 2020 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

The world is facing the challenge of climate crisis.

Misinformation

Paper
Add Code

SemEval-2017 Task 3: Community Question Answering

1 code implementation • SEMEVAL 2017 • Preslav Nakov, Doris Hoogeveen, Lluís Màrquez, Alessandro Moschitti, Hamdy Mubarak, Timothy Baldwin, Karin Verspoor

We describe SemEval-2017 Task 3 on Community Question Answering.

Community Question Answering Question Similarity

Paper
Code

Improved Document Modelling with a Neural Discourse Parser

1 code implementation • ALTA 2019 • Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

Paper
Code

Modelling Uncertainty in Collaborative Document Quality Assessment

no code implementations • WS 2019 • Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task.

Decision Making Gaussian Processes

Paper
Add Code

Reevaluating Argument Component Extraction in Low Resource Settings

no code implementations • WS 2019 • Anirudh Joshi, Timothy Baldwin, Richard Sinnott, Cecile Paris

Argument component extraction is a challenging and complex high-level semantic extraction task.

Multi-Task Learning named-entity-recognition +2

Paper
Add Code

Deep Ordinal Regression for Pledge Specificity Prediction

1 code implementation • IJCNLP 2019 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability.

regression Specificity

Paper
Code

Evaluating the Utility of Document Embedding Vector Difference for Relation Learning

no code implementations • 18 Jul 2019 • Jingyuan Zhang, Timothy Baldwin

Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy.

Document Embedding Relation

Paper
Add Code

Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation

1 code implementation • ACL 2019 • Nitika Mathur, Timothy Baldwin, Trevor Cohn

Accurate, automatic evaluation of machine translation is critical for system tuning, and evaluating progress in the field.

Machine Translation Sentence +2

Paper
Code

Semi-supervised Stochastic Multi-Domain Learning using Variational Inference

no code implementations • ACL 2019 • Yitong Li, Timothy Baldwin, Trevor Cohn

Supervised models of NLP rely on large collections of text which closely resemble the intended testing setting.

Domain Adaptation Variational Inference

Paper
Add Code

How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions

no code implementations • WS 2019 • N, Navnita akumar, Timothy Baldwin, Bahar Salehi

In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data.

Paper
Add Code

UniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution

no code implementations • SEMEVAL 2019 • Haonan Li, Minghan Wang, Timothy Baldwin, Martin Tomko, Maria Vasardani

This paper describes our submission to SemEval-2019 Task 12 on toponym resolution over scientific articles.

NER Toponym Resolution

Paper
Add Code

Target Based Speech Act Classification in Political Campaign Text

1 code implementation • SEMEVAL 2019 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance.

Classification General Classification

Paper
Code

Contextualization of Morphological Inflection

no code implementations • NAACL 2019 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn, Jason Eisner

Critical to natural language generation is the production of correctly inflected text.

Morphological Inflection Sentence +1

Paper
Add Code

Feature-guided Neural Model Training for Supervised Document Representation Learning

no code implementations • ALTA 2019 • Aili Shen, Bahar Salehi, Jianzhong Qi, Timothy Baldwin

Representation Learning

Paper
Add Code

Detecting Chemical Reactions in Patents

no code implementations • ALTA 2019 • Hiyori Yoshikawa, Dat Quoc Nguyen, Zenan Zhai, Christian Druckenbrodt, Camilo Thorne, Saber A. Akhondi, Timothy Baldwin, Karin Verspoor

Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration.

Paper
Add Code

Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

no code implementations • ALTA 2019 • Gaurav Arora, Afshin Rahimi, Timothy Baldwin

Catastrophic forgetting {---} whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a {``}catastrophic{''} drop in performance over the first task {---} is a hurdle in the development of better transfer learning techniques.

Continual Learning Transfer Learning

Paper
Add Code

A Joint Model for Multimodal Document Quality Assessment

no code implementations • 4 Jan 2019 • Aili Shen, Bahar Salehi, Timothy Baldwin, Jianzhong Qi

The quality of a document is affected by various factors, including grammaticality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one.

Paper
Add Code

A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions

no code implementations • ALTA 2018 • N, Navnita akumar, Bahar Salehi, Timothy Baldwin

In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions(``MWEs'').

Information Retrieval Representation Learning +1

Paper
Add Code

Towards Efficient Machine Translation Evaluation by Modelling Annotators

no code implementations • ALTA 2018 • Nitika Mathur, Timothy Baldwin, Trevor Cohn

In this paper we show that the quality control mechanism is overly conservative, which increases the time and expense of the evaluation.

Machine Translation Translation

Paper
Add Code

Twitter Geolocation using Knowledge-Based Methods

no code implementations • WS 2018 • Taro Miyazaki, Afshin Rahimi, Trevor Cohn, Timothy Baldwin

Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations.

Entity Linking Graph Embedding +1

Paper
Add Code

Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata

no code implementations • WS 2018 • Steven Xu, Andrew Bennett, Doris Hoogeveen, Jey Han Lau, Timothy Baldwin

Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains.

Answer Selection Community Question Answering +2

Paper
Add Code

Topic Intrusion for Automatic Topic Model Evaluation

no code implementations • EMNLP 2018 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.

Information Retrieval Topic Models

Paper
Add Code

Encoding Sentiment Information into Word Vectors for Sentiment Analysis

no code implementations • COLING 2018 • Zhe Ye, Fang Li, Timothy Baldwin

General-purpose pre-trained word embeddings have become a mainstay of natural language processing, and more recently, methods have been proposed to encode external knowledge into word embeddings to benefit specific downstream tasks.

Learning Word Embeddings Sentiment Analysis

Paper
Add Code

Language and the Shifting Sands of Domain, Space and Time (Invited Talk)

no code implementations • COLING 2018 • Timothy Baldwin

In this talk, I will first present recent work on domain debiasing in the context of language identification, then discuss a new line of work on language variety analysis in the form of dialect map generation.

Language Identification

Paper
Add Code

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

1 code implementation • ACL 2018 • Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond

In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.

Language Modelling

Paper
Code

Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model

1 code implementation • ACL 2018 • Shivashankar Subramanian, Timothy Baldwin, Trevor Cohn

Online petitions are a cost-effective way for citizens to collectively engage with policy-makers in a democracy.

regression

Paper
Code

Towards Robust and Privacy-preserving Text Representations

3 code implementations • ACL 2018 • Yitong Li, Timothy Baldwin, Trevor Cohn

Written text often provides sufficient clues to identify the author, their gender, age, and other important attributes.

Privacy Preserving

Paper
Code

Narrative Modeling with Memory Chains and Semantic Supervision

1 code implementation • ACL 2018 • Fei Liu, Trevor Cohn, Timothy Baldwin

Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task.

Cloze Test

Paper
Code

What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

1 code implementation • NAACL 2018 • Yitong Li, Timothy Baldwin, Trevor Cohn

Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.

Domain Adaptation Language Identification +1

Paper
Code

Hierarchical Structured Model for Fine-to-coarse Manifesto Text Analysis

no code implementations • NAACL 2018 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

Election manifestos document the intentions, motives, and views of political parties.

Position

Paper
Add Code

Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis

1 code implementation • NAACL 2018 • Fei Liu, Trevor Cohn, Timothy Baldwin

While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) --- extraction of fine-grained opinion polarity w. r. t.

Ranked #3 on Aspect-Based Sentiment Analysis (ABSA) on Sentihood

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

Semi-supervised User Geolocation via Graph Convolutional Networks

1 code implementation • ACL 2018 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin

Social media user geolocation is vital to many applications such as event detection.

Event Detection

Paper
Code

Automatic Language Identification in Texts: A Survey

1 code implementation • 22 Apr 2018 • Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.

Language Identification

Paper
Code

Automatic Negation and Speculation Detection in Veterinary Clinical Text

no code implementations • ALTA 2017 • Katherine Cheng, Timothy Baldwin, Karin Verspoor

Negation Speculation Detection

Paper
Add Code

A Hybrid Model for Quality Assessment of Wikipedia Articles

no code implementations • ALTA 2017 • Aili Shen, Jianzhong Qi, Timothy Baldwin

Paper
Add Code

Improving End-to-End Memory Networks with Unified Weight Tying

no code implementations • ALTA 2017 • Fei Liu, Trevor Cohn, Timothy Baldwin

Image Classification Speech Recognition

Paper
Add Code

Joint Sentence-Document Model for Manifesto Text Analysis

no code implementations • ALTA 2017 • Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin, Julian Brooke

General Classification Sentence +1

Paper
Add Code

Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields

1 code implementation • IJCNLP 2017 • Fei Liu, Timothy Baldwin, Trevor Cohn

Despite successful applications across a broad range of NLP tasks, conditional random fields ("CRFs"), in particular the linear-chain variant, are only able to model local features.

Paper
Code

Sub-character Neural Language Modelling in Japanese

no code implementations • WS 2017 • Viet Nguyen, Julian Brooke, Timothy Baldwin

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements.

Language Modelling

Paper
Add Code

BIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning

no code implementations • WS 2017 • Yitong Li, Trevor Cohn, Timothy Baldwin

This paper describes our submission to the sentiment analysis sub-task of {``}Build It, Break It: The Language Edition (BIBI){''}, on both the builder and breaker sides.

Q-Learning reinforcement-learning +4

Paper
Add Code

Sequence Effects in Crowdsourced Annotations

no code implementations • EMNLP 2017 • Nitika Mathur, Timothy Baldwin, Trevor Cohn

Manual data annotation is a vital component of NLP research.

Paper
Add Code

Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation

1 code implementation • EMNLP 2017 • Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu

Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators.

Machine Translation Translation

Paper
Code

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

1 code implementation • EMNLP 2017 • Afshin Rahimi, Timothy Baldwin, Trevor Cohn

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology.

regression

Paper
Code

An Automatic Approach for Document-level Topic Model Evaluation

no code implementations • CONLL 2017 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic models jointly learn topics and document-level topic distribution.

Topic Models

Paper
Add Code

Topically Driven Neural Language Model

1 code implementation • ACL 2017 • Jey Han Lau, Timothy Baldwin, Trevor Cohn

Language models are typically applied at the sentence level, without access to the broader document context.

Language Modelling Sentence

138

Paper
Code

A Neural Model for User Geolocation and Lexical Dialectology

no code implementations • ACL 2017 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin

We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms.

Paper
Add Code

Improving Evaluation of Document-level Machine Translation Quality Estimation

no code implementations • EACL 2017 • Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.

Document Level Machine Translation Machine Translation +2

Paper
Add Code

Robust Training under Linguistic Adversity

1 code implementation • EACL 2017 • Yitong Li, Trevor Cohn, Timothy Baldwin

Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks.

Sentiment Analysis Speech Recognition +1

Paper
Code

Multimodal Topic Labelling

no code implementations • EACL 2017 • Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin

Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.

Topic Models

Paper
Add Code

Decoupling Encoder and Decoder Networks for Abstractive Document Summarization

no code implementations • WS 2017 • Ying Xu, Jey Han Lau, Timothy Baldwin, Trevor Cohn

With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time.

Abstractive Text Summarization Decoder +1

Paper
Add Code

Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation

1 code implementation • WS 2017 • King Chan, Julian Brooke, Timothy Baldwin

This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations.

Paper
Code

Context-Aware Prediction of Derivational Word-forms

1 code implementation • EACL 2017 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn

Derivational morphology is a fundamental and complex characteristic of language.

Decoder LEMMA +1

Paper
Code

Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice

no code implementations • TACL 2017 • Julian Brooke, Jan {\v{S}}najder, Timothy Baldwin

We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates.

Paper
Add Code

Automatic Labelling of Topics with Neural Embeddings

1 code implementation • COLING 2016 • Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topics generated by topic models are typically represented as list of terms.

Topic Models

183

Paper
Code

Multiword Expressions at the Grammar-Lexicon Interface

no code implementations • WS 2016 • Timothy Baldwin

In this talk, I will outline a range of challenges presented by multiword expressions in terms of (lexicalist) precision grammar engineering, and different strategies for accommodating those challenges, in an attempt to strike the right balance in terms of generalisation and over- and under-generation.

Paper
Add Code

Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text

no code implementations • WS 2016 • Bo Han, Afshin Rahimi, Leon Derczynski, Timothy Baldwin

This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016.

Sentiment Analysis

Paper
Add Code

Is all that Glitters in Machine Translation Quality Estimation really Gold?

no code implementations • COLING 2016 • Yvette Graham, Timothy Baldwin, Meghan Dowling, Maria Eskevich, Teresa Lynn, Lamia Tounsi

Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment.

Machine Translation Translation

Paper
Add Code

Determining the Multiword Expression Inventory of a Surprise Language

no code implementations • COLING 2016 • Bahar Salehi, Paul Cook, Timothy Baldwin

Much previous research on multiword expressions (MWEs) has focused on the token- and type-level tasks of MWE identification and extraction, respectively.

Machine Translation

Paper
Add Code

Named Entity Recognition for Novel Types by Transfer Learning

no code implementations • EMNLP 2016 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Timothy Baldwin

In named entity recognition, we often don't have a large in-domain training corpus or a knowledge base with adequate coverage to train a model directly.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Learning Robust Representations of Text

1 code implementation • EMNLP 2016 • Yitong Li, Trevor Cohn, Timothy Baldwin

Deep neural networks have achieved remarkable results across many language processing tasks, however these methods are highly sensitive to noise and adversarial attacks.

Paper
Code

LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning

no code implementations • ACL 2016 • Andrew Bennett, Timothy Baldwin, Jey Han Lau, Diana McCarthy, Francis Bond

Lexical Simplification Natural Language Inference +1

Paper
Add Code

pigeo: A Python Geotagging Tool

no code implementations • ACL 2016 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin

Recommendation Systems

Paper
Add Code

Bootstrapped Text-level Named Entity Recognition for Literature

no code implementations • ACL 2016 • Julian Brooke, Adam Hammond, Timothy Baldwin

Clustering named-entity-recognition +2

Paper
Add Code

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

4 code implementations • WS 2016 • Jey Han Lau, Timothy Baldwin

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

Document Embedding Word Embeddings

636

Paper
Code

UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification

1 code implementation • SEMEVAL 2016 • Steven Xu, HuiZhi Liang, Timothy Baldwin

Ranked #12 on Question Answering on StoryCloze

Document Classification Language Modelling +4

Paper
Code

Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features

no code implementations • SEMEVAL 2016 • Julian Brooke, Alex Uitdenbogerd, ra, Timothy Baldwin

Complex Word Identification Text Simplification

Paper
Add Code

VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase)

no code implementations • SEMEVAL 2016 • Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin

Word Embeddings

Paper
Add Code

UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by combining a CNN with String Similarity Measures

no code implementations • SEMEVAL 2016 • Timothy Baldwin, HuiZhi Liang, Bahar Salehi, Doris Hoogeveen, Yitong Li, Long Duong

Community Question Answering Machine Translation +3

Paper
Add Code

The Sensitivity of Topic Coherence Evaluation to Topic Cardinality

no code implementations • NAACL 2016 • Jey Han Lau, Timothy Baldwin

Clustering Coherence Evaluation

Paper
Add Code

Evaluating a Topic Modelling Approach to Measuring Corpus Similarity

no code implementations • LREC 2016 • Richard Fothergill, Paul Cook, Timothy Baldwin

Web corpora are often constructed automatically, and their contents are therefore often not well understood.

Paper
Add Code

From Incremental Meaning to Semantic Unit (phrase by phrase)

1 code implementation • 17 Apr 2016 • Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin

This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval 2016 Task 10.

Word Embeddings

Paper
Code

Understanding engagement with insurgents through retweet rhetoric

no code implementations • ALTA 2015 • Joel Nothman, Atif Ahmad, Christoph Breidbach, David Malet, Timothy Baldwin

Dialogue Act Classification Opinion Mining

Paper
Add Code

Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment

no code implementations • ALTA 2015 • Julio Cesar Salinas Alvarado, Karin Verspoor, Timothy Baldwin

Domain Adaptation named-entity-recognition +2

Paper
Add Code

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

1 code implementation • ACL 2016 • Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin

Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision.

Clustering Relation +1

Paper
Code

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks

no code implementations • CONLL 2015 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin

Chunking Named Entity Recognition (NER) +3

Paper
Add Code

Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition

no code implementations • WS 2015 • Timothy Baldwin, Marie Catherine de Marneffe, Bo Han, Young-Bum Kim, Alan Ritter, Wei Xu

Lexical Normalization named-entity-recognition +2

Paper
Add Code

Twitter User Geolocation Using a Unified Text and Network Prediction Model

no code implementations • IJCNLP 2015 • Afshin Rahimi, Trevor Cohn, Timothy Baldwin

We propose a label propagation approach to geolocation prediction based on Modified Adsorption, with two enhancements:(1) the removal of "celebrity" nodes to increase location homophily and boost tractability, and (2) he incorporation of text-based geolocation priors for test users.

Paper
Add Code

Exploiting Text and Network Context for Geolocation of Social Media Users

no code implementations • HLT 2015 • Afshin Rahimi, Duy Vu, Trevor Cohn, Timothy Baldwin

Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets.

Paper
Add Code

The Impact of Multiword Expression Compositionality on Machine Translation Evaluation

no code implementations • WS 2015 • Bahar Salehi, Nitika Mathur, Paul Cook, Timothy Baldwin

Document Ranking Information Retrieval +2

Paper
Add Code

Collective Document Classification with Implicit Inter-document Semantic Relationships

no code implementations • SEMEVAL 2015 • Clint Burford, Steven Bird, Timothy Baldwin

Classification Document Classification +1

Paper
Add Code

RoseMerry: A Baseline Message-level Sentiment Classification System

no code implementations • SEMEVAL 2015 • Huizhi Liang, Richard Fothergill, Timothy Baldwin

Classification General Classification +3

Paper
Add Code

Accurate Evaluation of Segment-level Machine Translation Metrics

no code implementations • HLT 2015 • Timothy Baldwin, Yvette Graham, Nitika Mathur

Machine Translation Translation

Paper
Add Code

A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions

no code implementations • HLT 2015 • Timothy Baldwin, Paul Cook, Bahar Salehi

Word Embeddings

Paper
Add Code

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks

no code implementations • 21 Apr 2015 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin

Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications.

Chunking NER +4