Search Results for author: Mostafa Sadeghi

Found 25 papers, 2 papers with code

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications

no code implementations • 11 Mar 2024 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data.

Action Detection Activity Detection +2

Paper
Add Code

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

no code implementations • 2 Feb 2024 • Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker

In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.

Speech Enhancement Unsupervised Domain Adaptation

Paper
Add Code

End-to-end Joint Rich and Normalized ASR with a limited amount of rich training data

no code implementations • 29 Nov 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

End-to-end (E2E) ASR models offer both convenience and the ability to perform such joint transcription of speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

no code implementations • 16 Oct 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.

Automatic Speech Recognition Decoder +3

Paper
Add Code

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

no code implementations • 19 Sep 2023 • Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods.

Speech Enhancement

Paper
Add Code

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

no code implementations • 19 Sep 2023 • Mostafa Sadeghi, Romain Serizel

Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity.

Computational Efficiency Speech Enhancement +1

Paper
Add Code

Unsupervised speech enhancement with diffusion-based generative models

2 code implementations • 19 Sep 2023 • Berné Nortier, Mostafa Sadeghi, Romain Serizel

To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.

Speech Enhancement

403

Paper
Code

A weighted-variance variational autoencoder model for speech enhancement

no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel

A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.

Speech Enhancement

Paper
Add Code

Audio-visual speech enhancement with a deep Kalman filter generative model

no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Romain Serizel

Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audiovisual speech enhancement (AVSE).

Speech Enhancement

Paper
Add Code

Fast and efficient speech enhancement with variational autoencoders

no code implementations • 2 Nov 2022 • Mostafa Sadeghi, Romain Serizel

Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.

Computational Efficiency Speech Enhancement +1

Paper
Add Code

Expression-preserving face frontalization improves visually assisted speech processing

no code implementations • 6 Apr 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda

The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model.

Face Model Lip Reading +1

Paper
Add Code

A Sparsity-promoting Dictionary Model for Variational Autoencoders

no code implementations • 29 Mar 2022 • Mostafa Sadeghi, Paul Magron

Structuring the latent space in probabilistic deep generative models, e. g., variational autoencoders (VAEs), is important to yield more expressive models and interpretable representations, and to avoid overfitting.

Variational Inference

Paper
Add Code

The impact of removing head movements on audio-visual speech enhancement

no code implementations • 1 Feb 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar

This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE).

Speech Enhancement

Paper
Add Code

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

no code implementations • 8 Feb 2021 • Mostafa Sadeghi, Xavier Alameda-Pineda

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e. g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision.

Speech Enhancement

Paper
Add Code

Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks

no code implementations • 26 Oct 2020 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud

We propose to model inliers and outliers with the generalized Student's t-probability distribution function, a heavy-tailed distribution that is immune to non-Gaussian errors in the data.

Face Alignment Face Model +2

Paper
Add Code

Deep Variational Generative Models for Audio-visual Speech Separation

no code implementations • 17 Aug 2020 • Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda

To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.

Speech Separation

Paper
Add Code

Unsupervised Performance Analysis of 3D Face Alignment with a Statistically Robust Confidence Test

no code implementations • 14 Apr 2020 • Mostafa Sadeghi, Xavier Alameda-Pineda, Radu Horaud

The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them.

3D Face Alignment Face Alignment

Paper
Add Code

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

no code implementations • 23 Dec 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda

Two encoder networks input, respectively, audio and visual data, and the posterior of the latent variables is modeled as a mixture of two Gaussian distributions output from each encoder network.

Decoder Speech Enhancement +1

Paper
Add Code

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders

no code implementations • 10 Nov 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda

When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data.

Speech Enhancement

Paper
Add Code

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.

Speech Enhancement

Paper
Add Code

SSFN -- Self Size-estimating Feed-forward Network with Low Complexity, Limited Need for Human Intervention, and Consistent Behaviour across Trials

no code implementations • 17 May 2019 • Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Shumpei Kikuta, Dong Liu, Partha P. Mitra, Mikael Skoglund

We design a self size-estimating feed-forward network (SSFN) using a joint optimization approach for estimation of number of layers, number of nodes and learning of weight matrices.

Image Classification

Paper
Add Code

Progressive Learning for Systematic Design of Large Neural Networks

1 code implementation • 23 Oct 2017 • Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Partha P. Mitra, Mikael Skoglund

The developed network is expected to show good generalization power due to appropriate regularization and use of random weights in the layers.

Paper
Code

A Study on Clustering for Clustering Based Image De-Noising

no code implementations • 6 Jan 2015 • Hossein Bakhshi Golestani, Mohsen Joneidi, Mostafa Sadeghi

In the present paper, we suggest a method based on global clustering of image constructing blocks.

Clustering Dictionary Learning

Paper
Add Code

Union of Low-Rank Subspaces Detector

no code implementations • 29 Jul 2013 • Mohsen Joneidi, Parvin Ahmadi, Mostafa Sadeghi, Nazanin Rahnavard

The problem of signal detection using a flexible and general model is considered.

Action Detection Activity Detection

Paper
Add Code

Optimization of Clustering for Clustering-based Image Denoising

no code implementations • 12 Jun 2013 • Mohsen Joneidi, Mostafa Sadeghi

In this paper, the problem of de-noising of an image contaminated with additive white Gaussian noise (AWGN) is studied.

Clustering Dictionary Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.