Target Sound Extraction
5 papers with code • 3 benchmarks • 3 datasets
Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.
Most implemented papers
Real-Time Target Sound Extraction
We present the first neural network model to achieve real-time and streaming target sound extraction.
Target Sound Extraction with Variable Cross-modality Clues
Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction
Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables
To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use.
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction
Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings.