ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection

ICASSP 2022  ·  Taesoo Kim, Jiho Chang, Jong Hwan Ko ·

Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose adversarial domain adaptive VAD (ADA-VAD), which is a deep neural network (DNN) based VAD method highly robust to audio samples with various noise types and low SNRs. The proposed method trains DNN models for a VAD task in a supervised manner. Simultaneously, to mitigate the performance degradation due to back-ground noises, the adversarial domain adaptation method is adopted to match the domain discrepancy between noisy and clean audio stream in an unsupervised manner. The results show that ADA-VAD achieves an average of 3.6%p and 7%p higher AUC than models trained with manually extracted features on the AVA-speech dataset and a speech database synthesized with an unseen noise database, respectively.

PDF Abstract

Datasets


Results from the Paper


Ranked #4 on Activity Detection on AVA-Speech (ROC-AUC metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Activity Detection AVA-Speech ADA-VAD ROC-AUC 79.1 # 4

Methods


No methods listed for this paper. Add relevant methods here