Search Results for author: Niko Moritz

Found 18 papers, 0 papers with code

Effective internal language model training and fusion for factorized transducer model

no code implementations • 2 Apr 2024 • Jinxi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

However, even with the adoption of factorized transducer models, limited improvement has been observed compared to shallow fusion.

Language Modelling

Paper
Add Code

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

no code implementations • 18 Jan 2024 • Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code implementations • CVPR 2023 • Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Lip Reading speech-recognition +1

Paper
Add Code

Streaming Audio-Visual Speech Recognition with Alignment Regularization

no code implementations • 3 Nov 2022 • Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture.

Audio-Visual Speech Recognition Automatic Speech Recognition +6

Paper
Add Code

Anchored Speech Recognition with Neural Transducers

no code implementations • 20 Oct 2022 • Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Paper
Add Code

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

no code implementations • 19 Apr 2022 • Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen

The two most popular loss functions for streaming end-to-end automatic speech recognition (ASR) are RNN-Transducer (RNN-T) and connectionist temporal classification (CTC).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

no code implementations • 1 Mar 2022 • Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

As an example application, we use the extended GTC (GTC-e) for the multi-speaker speech recognition task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Sequence Transduction with Graph-based Supervision

no code implementations • 1 Nov 2021 • Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

no code implementations • 11 Oct 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

no code implementations • 2 Jul 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

no code implementations • 16 Jun 2021 • Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

no code implementations • 19 Apr 2021 • Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux

In this paper, we extend our prior work by (1) introducing the Conformer architecture to further improve the accuracy, (2) accelerating the decoding process with a novel activation recycling technique, and (3) enabling streaming decoding with triggered attention.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Capturing Multi-Resolution Context by Dilated Self-Attention

no code implementations • 7 Apr 2021 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

no code implementations • 26 Nov 2020 • Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Semi-Supervised Speech Recognition via Graph-based Temporal Classification

no code implementations • 29 Oct 2020 • Niko Moritz, Takaaki Hori, Jonathan Le Roux

However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4