Target Sound Extraction

5 papers with code • 3 benchmarks • 3 datasets

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Benchmarks

Add a Result

These leaderboards are used to track progress in Target Sound Extraction

Dataset	Best Model	Compare
FSDSoundScapes	Waveformer	See all
AudioCaps	CLAPSep	See all
AudioSet	CLAPSep	See all

Datasets

Subtasks

Streaming Target Sound Extraction

Most implemented papers

Most implemented Social Latest No code

Real-Time Target Sound Extraction

vb000/waveformer • • 4 Nov 2022

We present the first neural network model to achieve real-time and streaming target sound extraction.

Paper
Code

Target Sound Extraction with Variable Cross-modality Clues

lichenda/multi-clue-tse-data • • 15 Mar 2023

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

Paper
Code

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

JHU-LCAP/DPM-TSE • • 6 Oct 2023

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

Paper
Code

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

vb000/SemanticHearing • • 1 Nov 2023

To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use.

Paper
Code

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

aisaka0v0/clapsep • • 27 Feb 2024

Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings.

Paper
Code

Target Sound Extraction

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Real-Time Target Sound Extraction

Target Sound Extraction with Variable Cross-modality Clues

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

Content

Benchmarks

Add a Result