no code implementations • 2 Apr 2024 • Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion.
no code implementations • 18 Jan 2024 • Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide
Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • CVPR 2023 • Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen
Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).
no code implementations • 3 Nov 2022 • Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic
In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture.
Audio-Visual Speech Recognition Automatic Speech Recognition +6
no code implementations • 20 Oct 2022 • Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli
In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.
no code implementations • 19 Apr 2022 • Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen
The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are RNN-Transducer (RNN-T) and connectionist temporal classification (CTC).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Mar 2022 • Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux
As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Nov 2021 • Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux
The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Oct 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Jul 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Jun 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 19 Apr 2021 • Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Apr 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux
The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 29 Oct 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 14 Feb 2020 • Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux
We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Jan 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2