ADA-VAD: Unpaired Adversarial Domain Adaptation for Noise-Robust Voice Activity Detection
Voice Activity Detection (VAD) is becoming an essential front-end component in various speech processing systems. As those systems are commonly deployed in environments with diverse noise types and low signal-to-noise ratios (SNRs), an effective VAD method should perform robust detection of speech region out of noisy background signals. In this paper, we propose adversarial domain adaptive VAD (ADA-VAD), which is a deep neural network (DNN) based VAD method highly robust to audio samples with various noise types and low SNRs. The proposed method trains DNN models for a VAD task in a supervised manner. Simultaneously, to mitigate the performance degradation due to back-ground noises, the adversarial domain adaptation method is adopted to match the domain discrepancy between noisy and clean audio stream in an unsupervised manner. The results show that ADA-VAD achieves an average of 3.6%p and 7%p higher AUC than models trained with manually extracted features on the AVA-speech dataset and a speech database synthesized with an unseen noise database, respectively.
PDF AbstractDatasets
Results from the Paper
Ranked #4 on
Activity Detection
on AVA-Speech
(ROC-AUC metric)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Activity Detection | AVA-Speech | ADA-VAD | ROC-AUC | 79.1 | # 4 |