no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.
no code implementations • 23 Mar 2023 • Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh
In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.
no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas
A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • ICASSP 2022 • Viet Anh Trinh, Hassan Salami Kavaki, Michael I Mandel
We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions.
Ranked #1 on Keyword Spotting on Google Speech Commands (Google Speech Command-Musan metric)
no code implementations • 16 Nov 2021 • Viet Anh Trinh, Sebastian Braun
Our results show that the proposed function effectively improves the speech enhancement performance compared to a baseline trained in a supervised way on the noisy VoxCeleb dataset.
no code implementations • 2 Dec 2020 • Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance and generality of multi-channel spatial clustering.
no code implementations • 2 Dec 2020 • Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, Michael I. Mandel
Spatial clustering techniques can achieve significant multi-channel noise reduction across relatively arbitrary microphone configurations, but have difficulty incorporating a detailed speech/noise model.
no code implementations • 2 Dec 2020 • Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
The system is compared to several baselines on the CHiME3 dataset in terms of speech quality predicted by the PESQ algorithm and word error rate of a recognizer trained on mis-matched conditions, in order to focus on generalization.
no code implementations • 21 May 2020 • Viet Anh Trinh, Michael I Mandel
In this paper, we propose a metric that we call the structured saliency benchmark (SSBM) to evaluate importance maps computed for automatic speech recognizers on individual utterances.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2