1 code implementation • 27 Mar 2024 • Liang Lu, Jingzhi Wang, David R. Mortensen
Protolanguage reconstruction is central to historical linguistics.
no code implementations • 24 Jan 2022 • Liang Lu, Jinyu Li, Yifan Gong
Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.
no code implementations • 17 Sep 2021 • Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li
Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.
no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong
In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.
no code implementations • 5 Apr 2021 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.
no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong
The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Nov 2020 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.
no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 3 Nov 2020 • Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka
Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Oct 2020 • Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.
no code implementations • 19 May 2020 • Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong
While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.
no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong
Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Apr 2020 • Hirofumi Inaguma, Yashesh Gaur, Liang Lu, Jinyu Li, Yifan Gong
This leads to an inevitable latency during inference.
no code implementations • 23 Mar 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou
The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.
Audio and Speech Processing
1 code implementation • 30 Jan 2020 • Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li
In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 6 Dec 2019 • Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou
Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 23 Oct 2019 • Liang Lu
Transformer with self-attention has achieved great success in the area of nature language processing.
no code implementations • 9 Sep 2019 • Liang Lu, Eric Sun, Yifan Gong
Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.
1 code implementation • 12 Jul 2019 • Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong
While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.
no code implementations • 28 Oct 2017 • Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.
no code implementations • 28 Jun 2017 • Liang Lu
This paper investigates the use of binary weights and activations for computation and memory efficient neural network acoustic models.
no code implementations • 5 Apr 2017 • Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu
We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.
no code implementations • 21 Feb 2017 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith
Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.
1 code implementation • 13 Dec 2016 • Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu, Andrew Thompson
Its output feature is related to Cohen's class of time-frequency distributions.
Sound
no code implementations • 18 Oct 2016 • Liang Lu, Steve Renals
Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.
1 code implementation • 26 Sep 2016 • Ben Krause, Liang Lu, Iain Murray, Steve Renals
We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.
Ranked #14 on Language Modelling on Hutter Prize
no code implementations • 2 Aug 2016 • Liang Lu, Michelle Guo, Steve Renals
We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.
no code implementations • 7 Jul 2016 • Liang Lu
Highway deep neural network (HDNN) is a type of depth-gated feedforward neural network, which has shown to be easier to train with more hidden layers and also generalise better compared to conventional plain deep neural networks (DNNs).
no code implementations • 1 Mar 2016 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.
Ranked #16 on Speech Recognition on TIMIT
no code implementations • 14 Dec 2015 • Liang Lu, Steve Renals
For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.
1 code implementation • NAACL 2016 • Xingxing Zhang, Liang Lu, Mirella Lapata
Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have been successfully applied to a variety of sequence modeling tasks.
no code implementations • 4 Nov 2014 • Liang Lu, Steve Renals
Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.