no code implementations • 13 Jul 2023 • Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova
We show that this technique significantly closes the gap to methods that require explicit recordings.
no code implementations • 7 Dec 2022 • Daxin Tan, Nikos Kargas, David McHardy, Constantinos Papayiannis, Antonio Bonafonte, Marek Strelec, Jonas Rohnke, Agis Oikonomou Filandras, Trevor Wood
Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations.
no code implementations • 13 Feb 2022 • Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova
This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data.
no code implementations • 24 Oct 2021 • Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz Łajszczak, Trevor Wood
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures.
1 code implementation • NAACL 2021 • Shubhi Tyagi, Antonio Bonafonte, Jaime Lorenzo-Trueba, Javier Latorre
Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard.
1 code implementation • 20 Aug 2019 • Alp Öktem, Mireia Farrús, Antonio Bonafonte
Dubbing is a type of audiovisual translation where dialogues are translated and enacted so that they give the impression that the media is in the target language.
2 code implementations • 3 Jun 2019 • David Álvarez, Santiago Pascual, Antonio Bonafonte
This way we feed the acoustic model with speaker acoustically dependent representations that enrich the waveform generation more than discrete embeddings unrelated to these factors.
Sound Audio and Speech Processing
1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.
Ranked #2 on Distant Speech Recognition on DIRHA English WSJ
no code implementations • 6 Apr 2019 • Santiago Pascual, Joan Serrà, Antonio Bonafonte
The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility.
no code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks.
3 code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.
3 code implementations • 18 Dec 2017 • Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.
20 code implementations • 28 Mar 2017 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
no code implementations • LREC 2012 • Em{\'\i}lia Garcia Casademont, Antonio Bonafonte, Asunci{\'o}n Moreno
It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META, which is the Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information society.
no code implementations • LREC 2012 • Jordi Adell, Antonio Bonafonte, Antonio Cardenal, Marta R. Costa-juss{\`a}, Jos{\'e} A. R. Fonollosa, Asunci{\'o}n Moreno, Eva Navas, Eduardo R. Banga
The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7