Search Results for author: Geoffrey Zweig

Found 28 papers, 5 papers with code

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

no code implementations • 14 Jun 2021 • Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).

speech-recognition Speech Recognition

Paper
Add Code

Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

no code implementations • 9 Nov 2020 • Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation • 5 Nov 2020 • Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Ranked #15 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Multi-modal Self-Supervision from Generalized Data Transformations

no code implementations • 28 Sep 2020 • Mandela Patrick, Yuki Asano, Polina Kuznetsova, Ruth Fong, Joao F. Henriques, Geoffrey Zweig, Andrea Vedaldi

In this paper, we show that, for videos, the answer is more complex, and that better results can be obtained by accounting for the interplay between invariance, distinctiveness, multiple modalities and time.

Audio Classification Retrieval +1

Paper
Add Code

Contextual RNN-T For Open Domain ASR

no code implementations • 4 Jun 2020 • Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf

By using an attention model and a biasing model to leverage the contextual metadata that accompanies a video, we observe a relative improvement of about 16% in Word Error Rate on Named Entities (WER-NE) for videos with related metadata.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

no code implementations • 19 May 2020 • Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results.

Ranked #17 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Paper
Add Code

Large scale weakly and semi-supervised learning for low-resource video ASR

no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed

Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.

Decoder speech-recognition +1

Paper
Add Code

Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

no code implementations • 15 May 2020 • Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig

Videos uploaded on social media are often accompanied with textual descriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

On Compositions of Transformations in Contrastive Self-Supervised Learning

1 code implementation • ICCV 2021 • Mandela Patrick, Yuki M. Asano, Polina Kuznetsova, Ruth Fong, João F. Henriques, Geoffrey Zweig, Andrea Vedaldi

In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning.

Action Recognition Audio Classification +3

Paper
Code

Training ASR models by Generation of Contextual Information

no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.

Decoder speech-recognition +3

Paper
Add Code

Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.

Paper
Code

Transformer-based Acoustic Modeling for Hybrid Speech Recognition

no code implementations • 22 Oct 2019 • Yongqiang Wang, Abdel-rahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition.

Ranked #23 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

Paper
Add Code

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

no code implementations • 2 Oct 2019 • Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

no code implementations • LREC 2020 • Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations.

Data Augmentation

Paper
Add Code

May I take your order? A Neural Model for Extracting Structured Information from Conversations

no code implementations • EACL 2017 • Baolin Peng, Michael Seltzer, Y.C. Ju, Geoffrey Zweig, Kam-Fai Wong

This is motivated by an actual system under development to assist in the order taking process.

Machine Translation Translation

Paper
Add Code

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

3 code implementations • ACL 2017 • Jason D. Williams, Kavosh Asadi, Geoffrey Zweig

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors.

reinforcement-learning Reinforcement Learning (RL)

6,551

Paper
Code

End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

no code implementations • 3 Jun 2016 • Jason D. Williams, Geoffrey Zweig

This paper presents a model for end-to-end learning of task-oriented dialog systems.

Feature Engineering Reinforcement Learning (RL)

Paper
Add Code

An Attentional Neural Conversation Model with Improved Specificity

no code implementations • 3 Jun 2016 • Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Kam-Fai Wong

Experimental results indicate that the model outperforms previously proposed neural conversation architectures, and that using specificity in the objective function significantly improves performances for both generation and retrieval.

Retrieval Specificity

Paper
Add Code

Attention with Intention for a Neural Network Conversation Model

no code implementations • 29 Oct 2015 • Kaisheng Yao, Geoffrey Zweig, Baolin Peng

The intention network is a recurrent network that models the dynamics of the intention process.

Decoder Language Modelling

Paper
Add Code

Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme Conversion

no code implementations • 31 May 2015 • Kaisheng Yao, Geoffrey Zweig

We find that the simple side-conditioned generation approach is able to rival the state-of-the-art, and we are able to significantly advance the stat-of-the-art with bi-directional long short-term memory (LSTM) neural networks that use the same alignment information that is used in conventional approaches.

Image Captioning Language Modelling +2