Search Results for author: Kohei Matsuura

Found 10 papers, 2 papers with code

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

no code implementations • 31 Jan 2024 • Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima

Our analysis unveils that 1) the capacity to represent content information is somewhat unrelated to enhanced speaker representation, 2) specific layers of speech SSL models would be partly specialized in capturing linguistic information, and 3) speaker SSL models tend to disregard linguistic information but exhibit more sophisticated speaker representation.

Self-Supervised Learning

Paper
Add Code

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

1 code implementation • 14 Jun 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.

Natural Language Understanding Self-Supervised Learning +2

Paper
Code

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

no code implementations • 7 Jun 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix

End-to-end speech summarization (E2E SSum) directly summarizes input speech into easy-to-read short sentences with a single model.

Automatic Speech Recognition Decoder +4

Paper
Add Code

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

no code implementations • 25 May 2023 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture.

Knowledge Distillation Speech Extraction +2

Paper
Add Code

Improving Scheduled Sampling for Neural Transducer-based ASR

no code implementations • 25 May 2023 • Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura

Experiments in three datasets confirm that RNNT trained with our SS approach achieves the best ASR performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

no code implementations • 9 May 2023 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka

However, since the two settings have been studied individually in general, there has been little research focusing on how effective a cross-lingual model is in comparison with a monolingual model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Leveraging Large Text Corpora for End-to-End Speech Summarization

no code implementations • 2 Mar 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura

The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

no code implementations • 14 Jul 2022 • Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka

We investigate the performance on SUPERB while varying the structure and KD methods so as to keep the number of parameters constant; this allows us to analyze the contribution of the representation introduced by varying the model architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

1 code implementation • 19 May 2020 • Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We evaluated this speaker adaptation approach on two low-resource corpora, namely, Ainu and Mboshi.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

no code implementations • LREC 2020 • Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.