Search Results for author: Suyoun Kim

Found 20 papers, 3 papers with code

Augmenting text for spoken language understanding with Large Language Models

no code implementations • 17 Sep 2023 • Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1. 4% and 2. 6% absolute for existing and new domains respectively.

Semantic Parsing Spoken Language Understanding

Paper
Add Code

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

no code implementations • 22 Jul 2023 • Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently.

speech-recognition Speech Recognition +1

Paper
Add Code

Introducing Semantics into Speech Encoders

no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +10

Paper
Add Code

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

no code implementations • 31 Oct 2022 • Suyoun Kim, Ke Li, Lucas Kabela, Rongqing Huang, Jiedan Zhu, Ozlem Kalinli, Duc Le

In this work, we present our Joint Audio/Text training method for Transformer Rescorer, to leverage unpaired text-only data which is relatively cheaper than paired audio-text data.

speech-recognition Speech Recognition

Paper
Add Code

Deliberation Model for On-Device Spoken Language Understanding

no code implementations • 4 Apr 2022 • Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

no code implementations • 11 Oct 2021 • Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

Measuring automatic speech recognition (ASR) system quality is critical for creating user-satisfying voice-driven applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

no code implementations • 5 Apr 2021 • Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer

How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area.

Language Modelling speech-recognition +1

Paper
Add Code

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

no code implementations • 5 Apr 2021 • Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

We define SemDist as the distance between a reference and hypothesis pair in a sentence-level embedding space.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +14

Paper
Add Code

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation • 5 Nov 2020 • Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Ranked #15 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

no code implementations • 26 Oct 2020 • Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.

Language Modelling speech-recognition +1

Paper
Add Code

Cross-Attention End-to-End ASR for Two-Party Conversations

no code implementations • 24 Jul 2019 • Suyoun Kim, Siddharth Dalmia, Florian Metze

We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information.

speech-recognition Speech Recognition +1

Paper
Add Code

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

no code implementations • ACL 2019 • Suyoun Kim, Siddharth Dalmia, Florian Metze

We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings.

Sentence Sentence Embeddings +2

Paper
Add Code

Acoustic-to-Word Models with Conversational Context Information

no code implementations • NAACL 2019 • Suyoun Kim, Florian Metze

Conversational context information, higher-level knowledge that spans across sentences, can help to recognize a long conversation.

Sentence speech-recognition +1

Paper
Add Code

Dialog-context aware end-to-end speech recognition

no code implementations • 7 Aug 2018 • Suyoun Kim, Florian Metze

Existing speech recognition systems are typically built at the sentence level, although it is known that dialog context, e. g. higher-level knowledge that spans across sentences or speakers, can help the processing of long conversations.

Sentence speech-recognition +1

Paper
Add Code

Towards Language-Universal End-to-End Speech Recognition

no code implementations • 6 Nov 2017 • Suyoun Kim, Michael L. Seltzer

Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters.

Multi-Task Learning speech-recognition +1

Paper
Add Code

Improved training for online end-to-end speech recognition systems

1 code implementation • 6 Nov 2017 • Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.

speech-recognition Speech Recognition

Paper
Code

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

8 code implementations • 21 Sep 2016 • Suyoun Kim, Takaaki Hori, Shinji Watanabe

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

Decoder Multi-Task Learning +1

10,279

Paper
Code

Environmental Noise Embeddings for Robust Speech Recognition

no code implementations • 11 Jan 2016 • Suyoun Kim, Bhiksha Raj, Ian Lane

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.

Management Multi-Task Learning +2

Paper
Add Code

Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

no code implementations • 19 Nov 2015 • Suyoun Kim, Ian Lane

Integration of multiple microphone data is one of the key ways to achieve robust speech recognition in noisy environments or when the speaker is located at some distance from the input device.

Robust Speech Recognition Speech Enhancement +1

Paper
Add Code

Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

no code implementations • 9 Dec 2014 • Seungwhan Moon, Suyoun Kim, Haohan Wang

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality.

Video Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.