Search Results for author: Tatsuya Kawahara

Found 55 papers, 12 papers with code

An Attentive Listening System with Android ERICA: Comparison of Autonomous and WOZ Interactions

no code implementations • SIGDIAL (ACL) 2020 • Koji Inoue, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, Tatsuya Kawahara

The proposed system generates several types of listener responses: backchannels, repeats, elaborating questions, assessments, generic sentimental responses, and generic responses.

Dialogue Understanding

Paper
Add Code

Simultaneous Job Interview System Using Multiple Semi-autonomous Agents

no code implementations • SIGDIAL (ACL) 2022 • Haruki Kawai, Yusuke Muraki, Kenta Yamamoto, Divesh Lala, Koji Inoue, Tatsuya Kawahara

We propose a simultaneous job interview system, where one interviewer can conduct one-on-one interviews with multiple applicants simultaneously by cooperating with the multiple autonomous job interview dialogue systems.

Dialogue Understanding Keyword Extraction +1

Paper
Add Code

A multi-party attentive listening robot which stimulates involvement from side participants

no code implementations • SIGDIAL (ACL) 2021 • Koji Inoue, Hiromi Sakamoto, Kenta Yamamoto, Divesh Lala, Tatsuya Kawahara

We demonstrate the moderating abilities of a multi-party attentive listening robot system when multiple people are speaking in turns.

Paper
Add Code

Multilingual Turn-taking Prediction Using Voice Activity Projection

no code implementations • 11 Mar 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.

Paper
Add Code

Investigation of Adapter for Automatic Speech Recognition in Noisy Environment

no code implementations • 28 Feb 2024 • Hao Shi, Tatsuya Kawahara

Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Evaluation of a semi-autonomous attentive listening system with takeover prompting

no code implementations • 21 Feb 2024 • Haruki Kawai, Divesh Lala, Koji Inoue, Keiko Ochi, Tatsuya Kawahara

To this end, we propose a semi-autonomous system, where a remote operator can take control of an autonomous attentive listening system in real-time.

Spoken Dialogue Systems

Paper
Add Code

Acknowledgment of Emotional States: Generating Validating Responses for Empathetic Dialogue

no code implementations • 20 Feb 2024 • Zi Haur Pang, Yahui Fu, Divesh Lala, Keiko Ochi, Koji Inoue, Tatsuya Kawahara

This study introduces the first framework designed to engender empathetic dialogue with validating responses.

Response Generation

Paper
Add Code

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

no code implementations • 24 Jan 2024 • Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion.

FAD

Paper
Add Code

Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks

1 code implementation • 11 Jan 2024 • Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara

Personality recognition is useful for enhancing robots' ability to tailor user-adaptive responses, thus fostering rich human-robot interactions.

Data Augmentation

Paper
Code

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

1 code implementation • 10 Jan 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

A demonstration of a real-time and continuous turn-taking prediction system is presented.

Paper
Code

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

no code implementations • 10 Jan 2024 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors.

Spoken Dialogue Systems

Paper
Add Code

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

no code implementations • 21 Aug 2023 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors.

Paper
Add Code

Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation

no code implementations • 28 Jul 2023 • Yahui Fu, Koji Inoue, Chenhui Chu, Tatsuya Kawahara

We enhance ChatGPT's ability to reason for the system's perspective by integrating in-context learning with commonsense knowledge.

Empathetic Response Generation In-Context Learning +1

Paper
Add Code

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

no code implementations • 18 May 2023 • Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

At the decoded feature level, we fuse the two decoded features by generative and predictive decoders.

Decoder Speech Enhancement

Paper
Add Code

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

1 code implementation • 8 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Distilling the Knowledge of BERT for CTC-based ASR

no code implementations • 5 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-end Speech-to-Punctuated-Text Recognition

no code implementations • 7 Jul 2022 • Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto

We also propose to incorporate an auxiliary loss to train the model using the output of the intermediate layer and unpunctuated texts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

ASR Rescoring and Confidence Estimation with ELECTRA

no code implementations • 5 Oct 2021 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations • 9 Sep 2021 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Decoder Language Modelling +1

Paper
Add Code

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

no code implementations • 15 Jul 2021 • Hirofumi Inaguma, Tatsuya Kawahara

In this work, we propose novel decoding algorithms to enable streaming automatic speech recognition (ASR) on unsegmented long-form recordings without voice activity detection (VAD), based on monotonic chunkwise attention (MoChA) with an auxiliary connectionist temporal classification (CTC) objective.

Action Detection Activity Detection +3

Paper
Add Code

StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR

no code implementations • 1 Jul 2021 • Hirofumi Inaguma, Tatsuya Kawahara

Previous works tackled this problem by leveraging alignment information to control the timing to emit tokens during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

ERICA: An Empathetic Android Companion for Covid-19 Quarantine

no code implementations • SIGDIAL (ACL) 2021 • Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, Pascale Fung

Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems.

Paper
Add Code

Intelligent Conversational Android ERICA Applied to Attentive Listening and Job Interview

no code implementations • 2 May 2021 • Tatsuya Kawahara, Koji Inoue, Divesh Lala

It has also been evaluated with student subjects, showing promising results.

Information Retrieval Retrieval +1

Paper
Add Code

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

no code implementations • NAACL 2021 • Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe

To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition

no code implementations • 28 Feb 2021 • Hirofumi Inaguma, Tatsuya Kawahara

We compare CTC-ST with several methods that distill alignment knowledge from a hybrid ASR system and show that the CTC-ST can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Topic-relevant Response Generation using Optimal Transport for an Open-domain Dialog System

no code implementations • COLING 2020 • Shuying Zhang, Tianyu Zhao, Tatsuya Kawahara

The semantic constraint, which encourages a response to be semantically related to its context by regularizing the decoding objective function with semantic distance, is proposed.

Open-Domain Dialog Response Generation

Paper
Add Code

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations • 25 Oct 2020 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Decoder Translation

Paper
Add Code

Multi-Referenced Training for Dialogue Response Generation

1 code implementation • SIGDIAL (ACL) 2021 • Tianyu Zhao, Tatsuya Kawahara

In this work, we first analyze the training objective of dialogue models from the view of Kullback-Leibler divergence (KLD) and show that the gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.

Response Generation

Paper
Code

End-to-end Music-mixed Speech Recognition

1 code implementation • 27 Aug 2020 • Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara

The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.

Audio and Speech Processing

Paper
Code

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

1 code implementation • 9 Aug 2020 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Enhancing Monotonic Multihead Attention for Streaming ASR

1 code implementation • 19 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

584

Paper
Code

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

1 code implementation • 19 May 2020 • Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

CTC-synchronous Training for Monotonic Attention Model

1 code implementation • 10 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

584

Paper
Code

End-to-end speech-to-dialog-act recognition

no code implementations • 23 Apr 2020 • Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara

In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer before the softmax layer, which provides a distributed representation of word-level ASR decoding information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Designing Precise and Robust Dialogue Response Evaluators

1 code implementation • ACL 2020 • Tianyu Zhao, Divesh Lala, Tatsuya Kawahara

Automatic dialogue response evaluator has been proposed as an alternative to automated metrics and human evaluation.

Paper
Code

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

no code implementations • LREC 2020 • Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 • Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.

Speech Enhancement

179

Paper
Code

Multilingual End-to-End Speech Translation

1 code implementation • 1 Oct 2019 • Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

7,961

Paper
Code

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

no code implementations • 22 Sep 2019 • Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Effective Incorporation of Speaker Information in Utterance Encoding in Dialog

no code implementations • 12 Jul 2019 • Tianyu Zhao, Tatsuya Kawahara

In dialog studies, we often encode a dialog using a hierarchical encoder where each utterance is converted into an utterance vector, and then a sequence of utterance vectors is converted into a dialog vector.

Response Generation

Paper
Add Code

Content Word-based Sentence Decoding and Evaluating for Open-domain Neural Response Generation

no code implementations • 31 May 2019 • Tianyu Zhao, Shinsuke Mori, Tatsuya Kawahara

Various encoder-decoder models have been applied to response generation in open-domain dialogs, but a majority of conventional models directly learn a mapping from lexical input to lexical output without explicitly modeling intermediate representations.

Decoder Response Generation +1

Paper
Add Code

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

no code implementations • 22 Mar 2019 • Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Transfer learning of language-independent end-to-end ASR with language model fusion

no code implementations • 6 Nov 2018 • Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe

This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning.

Decoder Language Modelling +1

Paper
Add Code

A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System

no code implementations • WS 2018 • Tianyu Zhao, Tatsuya Kawahara

We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Joint Learning of Dialog Act Segmentation and Recognition in Spoken Dialog Using Neural Networks

no code implementations • IJCNLP 2017 • Tianyu Zhao, Tatsuya Kawahara

Dialog act segmentation and recognition are basic natural language understanding tasks in spoken dialog systems.

Automatic Speech Recognition (ASR) Natural Language Understanding +2

Paper
Add Code

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

no code implementations • 31 Oct 2017 • Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.

Speech Enhancement

Paper
Add Code

Attentive listening system with backchanneling, response generation and flexible turn-taking

no code implementations • WS 2017 • Divesh Lala, Pierrick Milhorat, Koji Inoue, Masanari Ishida, Katsuya Takanashi, Tatsuya Kawahara

Second, we propose an effective statement response mechanism which detects focus words and responds in the form of a question or partial repeat.

Automatic Speech Recognition (ASR) Response Generation +1

Paper
Add Code

Automatic Speech Recognition Errors as a Predictor of L2 Listening Difficulties

no code implementations • WS 2016 • Maryam Sadat Mirzaei, Kourosh Meshgi, Tatsuya Kawahara

To improve the choice of words in this system, and explore a better method to detect speech challenges, ASR errors were investigated as a model of the L2 listener, hypothesizing that some of these errors are similar to those of language learners{'} when transcribing the videos.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2