1 code implementation • 25 May 2023 • Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov
This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
no code implementations • 28 Oct 2022 • Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.
no code implementations • LREC 2022 • Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus.
1 code implementation • LREC 2022 • Rustem Yeshpanov, Yerbolat Khassanov, Huseyin Atakan Varol
We present the development of a dataset for Kazakh named entity recognition.
1 code implementation • 23 Oct 2021 • Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, Huseyin Atakan Varol
In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities.
1 code implementation • 3 Aug 2021 • Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
Our best monolingual and multilingual models achieved 20. 9% and 20. 5% average word error rates on the combined test set, respectively.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 30 Jul 2021 • Muhammadjon Musaev, Saida Mussakhojayeva, Ilyos Khujayorov, Yerbolat Khassanov, Mannon Ochilov, Huseyin Atakan Varol
We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 17 Apr 2021 • Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol
This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide.
1 code implementation • 5 Dec 2020 • Madina Abdrakhmanova, Askat Kuzdeuov, Sheikh Jarju, Yerbolat Khassanov, Michael Lewis, Huseyin Atakan Varol
We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition.
1 code implementation • EACL 2021 • Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol
We present an open-source speech corpus for the Kazakh language.
no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma
To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.
no code implementations • 18 May 2020 • Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Hao Huang, Eng Siong Chng
In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance.
no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li
To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma
The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 8 Apr 2019 • Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng
However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates.
1 code implementation • 1 Nov 2018 • Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li
Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.
no code implementations • 27 Jun 2018 • Yerbolat Khassanov, Eng Siong Chng
Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1