1 code implementation • 8 May 2024 • Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, JianHua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods.
no code implementations • 1 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.
no code implementations • 28 Jul 2023 • Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.
no code implementations • 9 Jun 2023 • Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu
Self-supervised speech models are a rapidly developing research topic in fake audio detection.
no code implementations • 8 Jun 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu
During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.
no code implementations • 10 Jan 2023 • Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao
Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.
no code implementations • 20 Dec 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen, Chu Yuan Zhang
To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech.
1 code implementation • 11 Nov 2022 • Jiangyan Yi, Chenglong Wang, JianHua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao
Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.
no code implementations • 21 Aug 2022 • Xinrui Yan, Jiangyan Yi, Chenglong Wang, JianHua Tao, Junzuo Zhou, Hao Gu, Ruibo Fu
The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation.
no code implementations • 20 Aug 2022 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu
Many effective attempts have been made for fake audio detection.
no code implementations • 20 Aug 2022 • Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu
The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.
no code implementations • 5 Mar 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen
We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech.
1 code implementation • 21 Feb 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen
It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript.
no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu
Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.
no code implementations • 16 Feb 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen
Firstly, we propose a global duration control attention mechanism for the SVS model.
1 code implementation • 8 Apr 2021 • Jiangyan Yi, Ye Bai, JianHua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu
Therefore, this paper develops such a dataset for half-truth audio detection (HAD).