Search Results for author: Liang Lu

Found 33 papers, 9 papers with code

Improved Neural Protoform Reconstruction via Reflex Prediction

1 code implementation • 27 Mar 2024 • Liang Lu, Jingzhi Wang, David R. Mortensen

Protolanguage reconstruction is central to historical linguistics.

Paper
Code

Endpoint Detection for Streaming End-to-End Multi-talker ASR

no code implementations • 24 Jan 2022 • Liang Lu, Jinyu Li, Yifan Gong

Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.

Sentence speech-recognition +2

Paper
Add Code

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations • 17 Sep 2021 • Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Paper
Add Code

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

Paper
Add Code

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

no code implementations • 5 Apr 2021 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.

Speaker Identification speech-recognition +2

Paper
Add Code

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Streaming end-to-end multi-talker speech recognition

no code implementations • 26 Nov 2020 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.

speech-recognition Speech Recognition

Paper
Add Code

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

1 code implementation • 3 Nov 2020 • Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

no code implementations • 23 Oct 2020 • Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.

Language Modelling speech-recognition +1

Paper
Add Code

Exploring Transformers for Large-Scale Speech Recognition

no code implementations • 19 May 2020 • Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.

speech-recognition Speech Recognition

Paper
Add Code

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

no code implementations • 10 Apr 2020 • Hirofumi Inaguma, Yashesh Gaur, Liang Lu, Jinyu Li, Yifan Gong

This leads to an inevitable latency during inference.

Decoder Multi-Task Learning +2

Paper
Add Code

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations • 23 Mar 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

Paper
Add Code

Continuous speech separation: dataset and analysis

1 code implementation • 30 Jan 2020 • Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

128

Paper
Code

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation • 6 Dec 2019 • Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

1 code implementation • 23 Oct 2019 • Liang Lu

Transformer with self-attention has achieved great success in the area of nature language processing.

speech-recognition Speech Recognition

Paper
Code

Self-Teaching Networks

no code implementations • 9 Sep 2019 • Liang Lu, Eric Sun, Yifan Gong

Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.

speech-recognition Speech Recognition

Paper
Add Code

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

1 code implementation • 12 Jul 2019 • Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

speech-recognition Speech Recognition

173

Paper
Code

A Study of All-Convolutional Encoders for Connectionist Temporal Classification

no code implementations • 28 Oct 2017 • Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu

We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-End Neural Segmental Models for Speech Recognition

no code implementations • 1 Aug 2017 • Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

Decoder speech-recognition +1

Paper
Add Code

Toward Computation and Memory Efficient Neural Network Acoustic Models with Binary Weights and Activations

no code implementations • 28 Jun 2017 • Liang Lu

This paper investigates the use of binary weights and activations for computation and memory efficient neural network acoustic models.

Efficient Neural Network speech-recognition +1

Paper
Add Code

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

no code implementations • 5 Apr 2017 • Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu

We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of combining the advantages of end-to-end training and more traditional pipeline approaches.

Decoder speech-recognition +1

Paper
Add Code

Multitask Learning with CTC and Segmental CRF for Speech Recognition

no code implementations • 21 Feb 2017 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.

speech-recognition Speech Recognition

Paper
Add Code

Adaptive DCTNet for Audio Signal Classification

1 code implementation • 13 Dec 2016 • Yin Xian, Yunchen Pu, Zhe Gan, Liang Lu, Andrew Thompson

Its output feature is related to Cohen's class of time-frequency distributions.

Sound

Paper
Code

Small-footprint Highway Deep Neural Networks for Speech Recognition

no code implementations • 18 Oct 2016 • Liang Lu, Steve Renals

Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.

speech-recognition Speech Recognition

Paper
Add Code

Multiplicative LSTM for sequence modelling

1 code implementation • 26 Sep 2016 • Ben Krause, Liang Lu, Iain Murray, Steve Renals

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.

Ranked #14 on Language Modelling on Hutter Prize

Density Estimation Language Modelling

Paper
Code

Knowledge Distillation for Small-footprint Highway Networks

no code implementations • 2 Aug 2016 • Liang Lu, Michelle Guo, Steve Renals

We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.

Acoustic Modelling Knowledge Distillation +2

Paper
Add Code

Sequence Training and Adaptation of Highway Deep Neural Networks

no code implementations • 7 Jul 2016 • Liang Lu

Highway deep neural network (HDNN) is a type of depth-gated feedforward neural network, which has shown to be easier to train with more hidden layers and also generalise better compared to conventional plain deep neural networks (DNNs).

speech-recognition Speech Recognition

Paper
Add Code

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations • 1 Mar 2016 • Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Ranked #16 on Speech Recognition on TIMIT

Acoustic Modelling Language Modelling +2

Paper
Add Code

Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition

no code implementations • 14 Dec 2015 • Liang Lu, Steve Renals

For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.

speech-recognition Speech Recognition

Paper
Add Code

Top-down Tree Long Short-Term Memory Networks

1 code implementation • NAACL 2016 • Xingxing Zhang, Liang Lu, Mirella Lapata

Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have been successfully applied to a variety of sequence modeling tasks.

Dependency Parsing Sentence +1

Paper
Code

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition

no code implementations • 4 Nov 2014 • Liang Lu, Steve Renals

Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.