no code implementations • 11 Mar 2024 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data.
no code implementations • 2 Feb 2024 • Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker
In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.
no code implementations • 29 Nov 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
End-to-end (E2E) ASR models offer both convenience and the ability to perform such joint transcription of speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Oct 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.
no code implementations • 19 Sep 2023 • Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel
Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods.
no code implementations • 19 Sep 2023 • Mostafa Sadeghi, Romain Serizel
Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity.
2 code implementations • 19 Sep 2023 • Berné Nortier, Mostafa Sadeghi, Romain Serizel
To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel
A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.
no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Romain Serizel
Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audiovisual speech enhancement (AVSE).
no code implementations • 2 Nov 2022 • Mostafa Sadeghi, Romain Serizel
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
no code implementations • 6 Apr 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda
The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model.
no code implementations • 29 Mar 2022 • Mostafa Sadeghi, Paul Magron
Structuring the latent space in probabilistic deep generative models, e. g., variational autoencoders (VAEs), is important to yield more expressive models and interpretable representations, and to avoid overfitting.
no code implementations • 1 Feb 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar
This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE).
no code implementations • 8 Feb 2021 • Mostafa Sadeghi, Xavier Alameda-Pineda
Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e. g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision.
no code implementations • 26 Oct 2020 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud
We propose to model inliers and outliers with the generalized Student's t-probability distribution function, a heavy-tailed distribution that is immune to non-Gaussian errors in the data.
no code implementations • 17 Aug 2020 • Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda
To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.
no code implementations • 14 Apr 2020 • Mostafa Sadeghi, Xavier Alameda-Pineda, Radu Horaud
The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them.
no code implementations • 23 Dec 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda
Two encoder networks input, respectively, audio and visual data, and the posterior of the latent variables is modeled as a mixture of two Gaussian distributions output from each encoder network.
no code implementations • 10 Nov 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda
When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data.
no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.
no code implementations • 17 May 2019 • Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Shumpei Kikuta, Dong Liu, Partha P. Mitra, Mikael Skoglund
We design a self size-estimating feed-forward network (SSFN) using a joint optimization approach for estimation of number of layers, number of nodes and learning of weight matrices.
1 code implementation • 23 Oct 2017 • Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Partha P. Mitra, Mikael Skoglund
The developed network is expected to show good generalization power due to appropriate regularization and use of random weights in the layers.
no code implementations • 6 Jan 2015 • Hossein Bakhshi Golestani, Mohsen Joneidi, Mostafa Sadeghi
In the present paper, we suggest a method based on global clustering of image constructing blocks.
no code implementations • 29 Jul 2013 • Mohsen Joneidi, Parvin Ahmadi, Mostafa Sadeghi, Nazanin Rahnavard
The problem of signal detection using a flexible and general model is considered.
no code implementations • 12 Jun 2013 • Mohsen Joneidi, Mostafa Sadeghi
In this paper, the problem of de-noising of an image contaminated with additive white Gaussian noise (AWGN) is studied.