TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	SETR-MLA (160k, MS)	Validation mIoU	50.28	# 109
Semantic Segmentation	Cityscapes test	SETR-PUP++	Mean IoU (class)	81.64%	# 36
Semantic Segmentation	Cityscapes val	SETR-PUP (80k, MS)	mIoU	82.15	# 33
Semantic Segmentation	DADA-seg	SETR (MLA, Transformer-Large)	mIoU	30.4	# 5
Semantic Segmentation	DADA-seg	SETR (PUP, Transformer-Large)	mIoU	31.8	# 4
Semantic Segmentation	DensePASS	SETR (MLA, Transformer-L)	mIoU	35.6%	# 19
Semantic Segmentation	DensePASS	SETR (PUP, Transformer-L)	mIoU	35.7%	# 18
Semantic Segmentation	FoodSeg103	SeTR-Naive (ViT-16/B)	mIoU	41.3	# 5
Semantic Segmentation	FoodSeg103	SeTR-MLA (ViT-16/B)	mIoU	45.1	# 2
Semantic Segmentation	PASCAL Context	SETR-MLA (16, 80k, MS)	mIoU	55.83	# 24
Medical Image Segmentation	Synapse multi-organ CT	SETR	Avg DSC	79.60	# 11
Semantic Segmentation	UrbanLF	SETR (ViT-Large)	mIoU (Real)	77.74	# 6
Semantic Segmentation	UrbanLF	SETR (ViT-Large)	mIoU (Syn)	77.69	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-foodseg103)](https://paperswithcode.com/sota/semantic-segmentation-on-foodseg103?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-dada-seg)](https://paperswithcode.com/sota/semantic-segmentation-on-dada-seg?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-urbanlf)](https://paperswithcode.com/sota/semantic-segmentation-on-urbanlf?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/medical-image-segmentation-on-synapse-multi)](https://paperswithcode.com/sota/medical-image-segmentation-on-synapse-multi?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-densepass)](https://paperswithcode.com/sota/semantic-segmentation-on-densepass?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-pascal-context)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-context?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-cityscapes)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes?p=rethinking-semantic-segmentation-from-a)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rethinking-semantic-segmentation-from-a/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=rethinking-semantic-segmentation-from-a)`

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

CVPR 2021 · Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, Li Zhang ·

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Code

Add Remove Mark official

fudan-zvg/SETR official

1,015

PaddlePaddle/PaddleSeg

8,252

BR-IDL/PaddleViT

1,185

gupta-abhay/setr-pytorch

188

920232796/setr-pytorch

126

Tasks

Add Remove

Medical Image Segmentation

Segmentation

Semantic Segmentation

Datasets

Cityscapes

ADE20K

PASCAL Context

DensePASS

DADA-seg

MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge

FoodSeg103

Results from the Paper

Edit

Ranked #2 on Semantic Segmentation on FoodSeg103 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	SETR-MLA (160k, MS)	Validation mIoU	50.28	# 109	Compare
Semantic Segmentation	Cityscapes test	SETR-PUP++	Mean IoU (class)	81.64%	# 36	Compare
Semantic Segmentation	Cityscapes val	SETR-PUP (80k, MS)	mIoU	82.15	# 33	Compare
Semantic Segmentation	DADA-seg	SETR (MLA, Transformer-Large)	mIoU	30.4	# 5	Compare
Semantic Segmentation	DADA-seg	SETR (PUP, Transformer-Large)	mIoU	31.8	# 4	Compare
Semantic Segmentation	DensePASS	SETR (MLA, Transformer-L)	mIoU	35.6%	# 19	Compare
Semantic Segmentation	DensePASS	SETR (PUP, Transformer-L)	mIoU	35.7%	# 18	Compare
Semantic Segmentation	FoodSeg103	SeTR-MLA (ViT-16/B)	mIoU	45.1	# 2	Compare
Semantic Segmentation	PASCAL Context	SETR-MLA (16, 80k, MS)	mIoU	55.83	# 24	Compare
Medical Image Segmentation	Synapse multi-organ CT	SETR	Avg DSC	79.60	# 11	Compare
Semantic Segmentation	UrbanLF	SETR (ViT-Large)	mIoU (Real)	77.74	# 6	Compare
Semantic Segmentation	UrbanLF	SETR (ViT-Large)	mIoU (Syn)	77.69	# 9	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Uses Extra Training Data	Source Paper	Compare
Semantic Segmentation	FoodSeg103	SeTR-Naive (ViT-16/B)	mIoU	41.3	# 5			See all

Methods

Add Remove

Convolution • Dense Connections • FCN • Layer Normalization • Linear Layer • Max Pooling • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • SETR • Softmax

Edit Social Preview

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit