no code implementations • 28 Sep 2023 • Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu
Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.
no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie
In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.
no code implementations • 8 Aug 2023 • Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao
Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.
1 code implementation • 15 Mar 2023 • Yuguang Yang, Yu Pan, JingJing Yin, Jiangyu Han, Lei Ma, Heng Lu
SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 5 Dec 2022 • Yuguang Yang, Yu Pan, JingJing Yin, Heng Lu
This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC).
no code implementations • 10 Feb 2022 • Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee
We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.