1 code implementation • 7 May 2024 • Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève
BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2. 0.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Oct 2023 • Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet
Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques.
no code implementations • 11 Sep 2023 • Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing.
no code implementations • 28 Aug 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data.
1 code implementation • 12 Jul 2023 • Titouan Parcollet, Rogier Van Dalen, Shucong Zhang, Sourav Bhattacharya
Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference as well as training and increasing memory consumption.
no code implementations • 29 Jun 2023 • Jarod Duret, Titouan Parcollet, Yannick Estève
We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units.
1 code implementation • 1 Jun 2023 • Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data.
no code implementations • 1 Jun 2023 • Salah Zaiem, Titouan Parcollet, Slim Essid
Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets.
2 code implementations • 29 May 2023 • Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlicek
In particular, multi-head HyperConformer achieves comparable or higher recognition performance while being more efficient than Conformer in terms of inference speed, memory, parameter count, and available training data.
1 code implementation • 12 Mar 2023 • Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Feb 2023 • Adel Moumen, Titouan Parcollet
The light gated recurrent units (Li-GRU) is well-known for achieving impressive results in automatic speech recognition (ASR) tasks while being lighter and faster to train than a standard gated recurrent units (GRU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Sep 2022 • Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane
Self-supervised learning (SSL) has proven vital in speech and audio-related applications.
no code implementations • ICLR 2022 • Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gusmao, Yan Gao, Titouan Parcollet, Nicholas Donald Lane
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed.
1 code implementation • 8 Apr 2022 • Salah Zaiem, Titouan Parcollet, Slim Essid
Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training.
no code implementations • 6 Apr 2022 • Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Abhinav Mehrotra, Nicholas D. Lane
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge.
no code implementations • 2 Apr 2022 • Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 1 Jul 2021 • Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
4 code implementations • 8 Jun 2021 • Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato de Mori, Yoshua Bengio
SpeechBrain is an open-source and all-in-one speech toolkit.
1 code implementation • 29 Apr 2021 • Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane
Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently.
1 code implementation • 23 Apr 2021 • Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 15 Apr 2021 • Salah Zaiem, Titouan Parcollet, Slim Essid
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 7 Apr 2021 • Akhil Mathur, Daniel J. Beutel, Pedro Porto Buarque de Gusmão, Javier Fernandez-Marques, Taner Topal, Xinchi Qiu, Titouan Parcollet, Yan Gao, Nicholas D. Lane
Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store data in the cloud.
2 code implementations • 4 Apr 2021 • Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet
This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers.
Ranked #4 on Spoken Language Understanding on Timers and Such (using extra training data)
no code implementations • 15 Feb 2021 • Xinchi Qiu, Titouan Parcollet, Javier Fernandez-Marques, Pedro Porto Buarque de Gusmao, Yan Gao, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane
Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers.
1 code implementation • 8 Dec 2020 • Paul-Gauthier Noé, Mohammad Mohammadamini, Driss Matrouf, Titouan Parcollet, Andreas Nautsch, Jean-François Bonastre
In order to allow the user to choose which information to protect, we introduce in this paper the concept of attribute-driven privacy preservation in speaker voice representation.
no code implementations • 13 Oct 2020 • Xinchi Qiu, Titouan Parcollet, Daniel J. Beutel, Taner Topal, Akhil Mathur, Nicholas D. Lane
Then, we compare the carbon footprint of FL to traditional centralized learning.
1 code implementation • 28 Jul 2020 • Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, Nicholas D. Lane
Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud.
3 code implementations • 19 May 2020 • Yan Gao, Titouan Parcollet, Nicholas Lane
In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 18 May 2020 • Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas Lane, Mohamed Morchid
In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Jun 2019 • Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato de Mori
Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Apr 2019 • Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès
TRS transcripts are only used to measure the performances of ASR systems.
Generative Adversarial Network Spoken Language Understanding
no code implementations • 21 Nov 2018 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato de Mori
Neural network architectures are at the core of powerful automatic speech recognition systems (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
11 code implementations • 19 Nov 2018 • Mirco Ravanelli, Titouan Parcollet, Yoshua Bengio
Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
Ranked #1 on Distant Speech Recognition on DIRHA English WSJ
2 code implementations • 6 Nov 2018 • Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato de Mori
Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 20 Jun 2018 • Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio
Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models.
Ranked #19 on Speech Recognition on TIMIT
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
3 code implementations • ICLR 2019 • Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato de Mori, Yoshua Bengio
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1