Paper tables with annotated results for Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Paper

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Video scene parsing incorporates temporal information, which can enhance the consistency and accuracy of predictions compared to image scene parsing. The added temporal dimension enables a more comprehensive understanding of the scene, leading to more reliable results. This paper presents the winning solution of the CVPR2023 workshop for video semantic segmentation, focusing on enhancing Spatial-Temporal correlations with contrastive loss. We also explore the influence of multi-dataset training by utilizing a label-mapping technique. And the final result is aggregating the output of the above two models. Our approach achieves 65.95% mIoU performance on the VSPW dataset, ranked 1st place on the VSPW challenge at CVPR 2023.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Reader Guidelines

Editor Guidelines