Search Results for author: Yerbolat Khassanov

Found 17 papers, 10 papers with code

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

1 code implementation • 25 May 2023 • Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

Speech Synthesis Text-To-Speech Synthesis +2

Paper
Code

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

no code implementations • 28 Oct 2022 • Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma

One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.

Action Detection Activity Detection +4

Paper
Add Code

KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics

no code implementations • LREC 2022 • Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus.

Topic coverage

Paper
Add Code

KazNERD: Kazakh Named Entity Recognition Dataset

1 code implementation • LREC 2022 • Rustem Yeshpanov, Yerbolat Khassanov, Huseyin Atakan Varol

We present the development of a dataset for Kazakh named entity recognition.

named-entity-recognition Named Entity Recognition +1

Paper
Code

A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

1 code implementation • 23 Oct 2021 • Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, Huseyin Atakan Varol

In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities.

Paper
Code

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

1 code implementation • 3 Aug 2021 • Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

Our best monolingual and multilingual models achieved 20. 9% and 20. 5% average word error rates on the combined test set, respectively.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

1 code implementation • 30 Jul 2021 • Muhammadjon Musaev, Saida Mussakhojayeva, Ilyos Khujayorov, Yerbolat Khassanov, Mannon Ochilov, Huseyin Atakan Varol

We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

1 code implementation • 17 Apr 2021 • Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol

This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide.

Speech Synthesis Text-To-Speech Synthesis

108

Paper
Code

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

1 code implementation • 5 Dec 2020 • Madina Abdrakhmanova, Askat Kuzdeuov, Sheikh Jarju, Yerbolat Khassanov, Michael Lewis, Huseyin Atakan Varol

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human-computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition.

speech-recognition Speech Recognition +1

Paper
Code

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

1 code implementation • EACL 2021 • Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol

We present an open-source speech corpus for the Kazakh language.

speech-recognition Speech Recognition

Paper
Code

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

no code implementations • 21 May 2020 • Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Eng Siong Chng, Chongjia Ni, Bin Ma

To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture.

Cross-Lingual Transfer Decoder +2

Paper
Add Code

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

no code implementations • 18 May 2020 • Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Hao Huang, Eng Siong Chng

In this paper, we present a series of complementary approaches to improve the recognition of underrepresented named entities (NE) in hybrid ASR systems without compromising overall word error rate performance.

Language Modelling

Paper
Add Code

Independent language modeling architecture for end-to-end ASR

no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li

To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

no code implementations • 8 Apr 2019 • Yerbolat Khassanov, Hai-Hua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

1 code implementation • 8 Apr 2019 • Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng

However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates.

speech-recognition Speech Recognition

Paper
Code

On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition

1 code implementation • 1 Nov 2018 • Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li

Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.

Data Augmentation Language Identification +3

Paper
Code

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

no code implementations • 27 Jun 2018 • Yerbolat Khassanov, Eng Siong Chng

Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.