Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

3 Feb 2021  ·  Shengkui Zhao, Trung Hieu Nguyen, Bin Ma ·

Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Enhancement Deep Noise Suppression (DNS) Challenge FRCRN PESQ-WB 3.23 # 2
Speech Enhancement DNS Challenge DCUnet-MC PESQ-NB 3.3 # 1
Speech Enhancement DNS Challenge DCCRN-MC PESQ-NB 3.21 # 2
Speech Enhancement DNS Challenge DCCRN-M PESQ-NB 3.15 # 3
Speech Enhancement DNS Challenge DCCRN PESQ-NB 3.04 # 4
Speech Enhancement VoiceBank + DEMAND D2Former PESQ 3.43 # 2
PESQ-WB 3.43 # 2
Para. (M) 0.86 # 2
Speech Enhancement WSJ0 + DEMAND + RNNoise DCUNet-MC PESQ-NB 3.44 # 1
Speech Enhancement WSJ0 + DEMAND + RNNoise DCUNet PESQ-NB 3.25 # 3
Speech Enhancement WSJ0 + DEMAND + RNNoise DCCRN-M PESQ-NB 3.28 # 2

Methods