Paper tables with annotated results for MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Paper

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Reader Guidelines

Editor Guidelines