no code implementations • 4 Nov 2022 • Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran
By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Aug 2020 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework.
no code implementations • 27 Dec 2019 • Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu
This paper presents a method that generates expressive singing voice of Peking opera.
no code implementations • 20 Dec 2019 • Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu
The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks.
no code implementations • 4 Dec 2019 • Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely.
no code implementations • 28 Nov 2019 • Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu
In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.
4 code implementations • 4 Sep 2019 • Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.
no code implementations • ICLR 2019 • Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu
We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping corpus.
no code implementations • 24 Oct 2016 • Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen
This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE).