no code implementations • 3 May 2024 • Yu Pan, Yuguang Yang, Heng Lu, Lei Ma, Jianjun Zhao
The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER).
no code implementations • 28 Sep 2023 • Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu
Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.
no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie
In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.
no code implementations • 8 Aug 2023 • Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, JingJing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao
Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world.
no code implementations • 13 Jun 2023 • Yu Pan, Yanni Hu, Yuguang Yang, Wen Fei, Jixun Yao, Heng Lu, Lei Ma, Jianjun Zhao
Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER).
1 code implementation • 11 Jun 2023 • Yuguang Yang, Yiming Wang, Shupeng Geng, Runqi Wang, Yimi Wang, Sheng Wu, Baochang Zhang
The emergence of cross-modal foundation models has introduced numerous approaches grounded in text-image retrieval.
1 code implementation • journal 2023 • Yuguang Yang, Shupeng Geng, Baochang Zhang, Juan Zhang, Zheng Wang, Yong Zhang & David Doermann
However, long term prediction horizon exposes the non-stationarity of series data, which deteriorates the performance of existing approaches.
no code implementations • 27 May 2023 • Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Juan Zhang, Xuan Gong, Baochang Zhang
Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions.
1 code implementation • 15 Mar 2023 • Yuguang Yang, Yu Pan, JingJing Yin, Jiangyu Han, Lei Ma, Heng Lu
SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 5 Dec 2022 • Yuguang Yang, Yu Pan, JingJing Yin, Heng Lu
This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC).
1 code implementation • 23 Feb 2022 • Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke
This is observed especially with underrepresented demographic groups sharing similar voice characteristics.
no code implementations • 7 Feb 2022 • Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow
Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.
no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 18 Sep 2021 • Yuguang Yang, Yu Pan, Xin Dong, Minqiang Xu
Second, we design a novel model inference scheme based on RepVGG which can efficiently improve the QbE search quality.
no code implementations • 6 Sep 2021 • Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke
Second, a scoring function is applied between a runtime utterance and each speaker profile.
no code implementations • 25 Nov 2019 • Yuguang Yang
SDQL exploits the linear stage structure by approximating the Q function via a collection of deep Q sub-networks stacking along an axis marking the stage-wise progress of the whole task.
no code implementations • 26 Jun 2019 • Yuguang Yang, Michael A. Bevan, Bo Li
Equipping active colloidal robots with intelligence such that they can efficiently navigate in unknown complex environments could dramatically impact their use in emerging applications like precision surgery and targeted drug delivery.