TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Pedestrian Detection	Caltech	VLPD	Reasonable Miss Rate	2.3	# 5
Pedestrian Detection	Caltech	VLPD	Heavy MR^-2	37.7	# 5
Pedestrian Detection	CityPersons	VLPD	Reasonable MR^-2	9.4	# 8
Pedestrian Detection	CityPersons	VLPD	Heavy MR^-2	43.1	# 9
Pedestrian Detection	CityPersons	VLPD	Partial MR^-2	8.8	# 4
Pedestrian Detection	CityPersons	VLPD	Bare MR^-2	6.1	# 3
Pedestrian Detection	CityPersons	VLPD	Small MR^-2	10.9	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vlpd-context-aware-pedestrian-detection-via/pedestrian-detection-on-caltech)](https://paperswithcode.com/sota/pedestrian-detection-on-caltech?p=vlpd-context-aware-pedestrian-detection-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vlpd-context-aware-pedestrian-detection-via/pedestrian-detection-on-citypersons)](https://paperswithcode.com/sota/pedestrian-detection-on-citypersons?p=vlpd-context-aware-pedestrian-detection-via)`

VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision

CVPR 2023 · Mengyin Liu, Jie Jiang, Chao Zhu, Xu-Cheng Yin ·

Detecting pedestrians accurately in urban scenes is significant for realistic applications like autonomous driving or video surveillance. However, confusing human-like objects often lead to wrong detections, and small scale or heavily occluded pedestrians are easily missed due to their unusual appearances. To address these challenges, only object regions are inadequate, thus how to fully utilize more explicit and semantic contexts becomes a key problem. Meanwhile, previous context-aware pedestrian detectors either only learn latent contexts with visual clues, or need laborious annotations to obtain explicit and semantic contexts. Therefore, we propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations. Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models. Furthermore, a self-supervised Prototypical Semantic Contrastive (PSC) learning method is proposed to better discriminate pedestrians and other classes, based on more explicit and semantic contexts obtained from VLS. Extensive experiments on popular benchmarks show that our proposed VLPD achieves superior performances over the previous state-of-the-arts, particularly under challenging circumstances like small scale and heavy occlusion. Code is available at https://github.com/lmy98129/VLPD.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

lmy98129/vlpd official

Tasks

Add Remove

Autonomous Driving

Pedestrian Detection

Datasets

ImageNet

Cityscapes

CityPersons Caltech Pedestrian Dataset

Results from the Paper

Edit

Ranked #5 on Pedestrian Detection on Caltech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Pedestrian Detection	Caltech	VLPD	Reasonable Miss Rate	2.3	# 5	Compare
Pedestrian Detection	Caltech	VLPD	Heavy MR^-2	37.7	# 5	Compare
Pedestrian Detection	CityPersons	VLPD	Reasonable MR^-2	9.4	# 8	Compare
			Heavy MR^-2	43.1	# 9	Compare
			Partial MR^-2	8.8	# 4	Compare
			Bare MR^-2	6.1	# 3	Compare
			Small MR^-2	10.9	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove