no code implementations • SIGDIAL (ACL) 2020 • Koji Inoue, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, Tatsuya Kawahara
The proposed system generates several types of listener responses: backchannels, repeats, elaborating questions, assessments, generic sentimental responses, and generic responses.
no code implementations • SIGDIAL (ACL) 2022 • Haruki Kawai, Yusuke Muraki, Kenta Yamamoto, Divesh Lala, Koji Inoue, Tatsuya Kawahara
We propose a simultaneous job interview system, where one interviewer can conduct one-on-one interviews with multiple applicants simultaneously by cooperating with the multiple autonomous job interview dialogue systems.
no code implementations • SIGDIAL (ACL) 2021 • Koji Inoue, Hiromi Sakamoto, Kenta Yamamoto, Divesh Lala, Tatsuya Kawahara
We demonstrate the moderating abilities of a multi-party attentive listening robot system when multiple people are speaking in turns.
no code implementations • 11 Mar 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze
The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.
no code implementations • 28 Feb 2024 • Hao Shi, Tatsuya Kawahara
Adapting an automatic speech recognition (ASR) system to unseen noise environments is crucial.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 Feb 2024 • Haruki Kawai, Divesh Lala, Koji Inoue, Keiko Ochi, Tatsuya Kawahara
To this end, we propose a semi-autonomous system, where a remote operator can take control of an autonomous attentive listening system in real-time.
no code implementations • 20 Feb 2024 • Zi Haur Pang, Yahui Fu, Divesh Lala, Keiko Ochi, Koji Inoue, Tatsuya Kawahara
This study introduces the first framework designed to engender empathetic dialogue with validating responses.
no code implementations • 24 Jan 2024 • Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara
We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion.
1 code implementation • 11 Jan 2024 • Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara
Personality recognition is useful for enhancing robots' ability to tailor user-adaptive responses, thus fostering rich human-robot interactions.
1 code implementation • 10 Jan 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze
A demonstration of a real-time and continuous turn-taking prediction system is presented.
no code implementations • 10 Jan 2024 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze
To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors.
no code implementations • 21 Aug 2023 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze
This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors.
no code implementations • 28 Jul 2023 • Yahui Fu, Koji Inoue, Chenhui Chu, Tatsuya Kawahara
We enhance ChatGPT's ability to reason for the system's perspective by integrating in-context learning with commonsense knowledge.
no code implementations • 18 May 2023 • Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji
At the decoded feature level, we fuse the two decoded features by generative and predictive decoders.
1 code implementation • 8 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Jul 2022 • Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto
We also propose to incorporate an auxiliary loss to train the model using the output of the intermediate layer and unpunctuated texts.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 5 Oct 2021 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 9 Sep 2021 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
no code implementations • 15 Jul 2021 • Hirofumi Inaguma, Tatsuya Kawahara
In this work, we propose novel decoding algorithms to enable streaming automatic speech recognition (ASR) on unsegmented long-form recordings without voice activity detection (VAD), based on monotonic chunkwise attention (MoChA) with an auxiliary connectionist temporal classification (CTC) objective.
no code implementations • 1 Jul 2021 • Hirofumi Inaguma, Tatsuya Kawahara
Previous works tackled this problem by leveraging alignment information to control the timing to emit tokens during training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • SIGDIAL (ACL) 2021 • Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, Pascale Fung
Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems.
no code implementations • 2 May 2021 • Tatsuya Kawahara, Koji Inoue, Divesh Lala
It has also been evaluated with student subjects, showing promising results.
no code implementations • NAACL 2021 • Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe
To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 28 Feb 2021 • Hirofumi Inaguma, Tatsuya Kawahara
We compare CTC-ST with several methods that distill alignment knowledge from a hybrid ASR system and show that the CTC-ST can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • COLING 2020 • Shuying Zhang, Tianyu Zhao, Tatsuya Kawahara
The semantic constraint, which encourages a response to be semantically related to its context by regularizing the decoding objective function with semantic distance, is proposed.
no code implementations • 25 Oct 2020 • Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.
1 code implementation • SIGDIAL (ACL) 2021 • Tianyu Zhao, Tatsuya Kawahara
In this work, we first analyze the training objective of dialogue models from the view of Kullback-Leibler divergence (KLD) and show that the gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.
1 code implementation • 27 Aug 2020 • Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara
The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.
Audio and Speech Processing
1 code implementation • 9 Aug 2020 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 19 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 May 2020 • Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 10 May 2020 • Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Apr 2020 • Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara
In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer before the softmax layer, which provides a distributed representation of word-level ASR decoding information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • ACL 2020 • Tianyu Zhao, Divesh Lala, Tatsuya Kawahara
Automatic dialogue response evaluator has been proposed as an alternative to automated metrics and human evaluation.
no code implementations • LREC 2020 • Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 • Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.
1 code implementation • 1 Oct 2019 • Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe
In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 22 Sep 2019 • Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 12 Jul 2019 • Tianyu Zhao, Tatsuya Kawahara
In dialog studies, we often encode a dialog using a hierarchical encoder where each utterance is converted into an utterance vector, and then a sequence of utterance vectors is converted into a dialog vector.
no code implementations • 31 May 2019 • Tianyu Zhao, Shinsuke Mori, Tatsuya Kawahara
Various encoder-decoder models have been applied to response generation in open-domain dialogs, but a majority of conventional models directly learn a mapping from lexical input to lexical output without explicitly modeling intermediate representations.
no code implementations • 22 Mar 2019 • Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Nov 2018 • Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe
This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning.
no code implementations • WS 2018 • Tianyu Zhao, Tatsuya Kawahara
We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • IJCNLP 2017 • Tianyu Zhao, Tatsuya Kawahara
Dialog act segmentation and recognition are basic natural language understanding tasks in spoken dialog systems.
Automatic Speech Recognition (ASR) Natural Language Understanding +2
no code implementations • 31 Oct 2017 • Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.
no code implementations • WS 2017 • Divesh Lala, Pierrick Milhorat, Koji Inoue, Masanari Ishida, Katsuya Takanashi, Tatsuya Kawahara
Second, we propose an effective statement response mechanism which detects focus words and responds in the form of a question or partial repeat.
no code implementations • WS 2016 • Maryam Sadat Mirzaei, Kourosh Meshgi, Tatsuya Kawahara
To improve the choice of words in this system, and explore a better method to detect speech challenges, ASR errors were investigated as a model of the L2 listener, hypothesizing that some of these errors are similar to those of language learners{'} when transcribing the videos.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2012 • Tomoyosi Akiba, Hiromitsu Nishizaki, Kiyoaki Aikawa, Tatsuya Kawahara, Tomoko Matsui
We describe the evaluation framework for spoken document retrieval for the IR for the Spoken Documents Task, conducted in the ninth NTCIR Workshop.