no code implementations • 5 Apr 2021 • Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer
How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area.
no code implementations • 4 Jun 2020 • Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf
By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 Nov 2019 • Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer
As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems.
no code implementations • 5 Nov 2019 • Mahaveer Jain, Kjell Schubert, Jay Mahadeokar, Ching-Feng Yeh, Kaustubh Kalgaonkar, Anuroop Sriram, Christian Fuegen, Michael L. Seltzer
Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 28 Oct 2019 • Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer
We explore options to use Transformer networks in neural transducer for end-to-end speech recognition.
no code implementations • 5 Dec 2018 • Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen
In this work, we focus on contextual speech recognition, which is particularly challenging for E2E models because it introduces significant mismatch between training and test data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2