TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Top 1 Accuracy	80.7%	# 630
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Number of params	5.4M	# 419
Image Classification	ImageNet	TinyViT-5M-distill (21k)	GFLOPs	1.3	# 118
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Top 1 Accuracy	83.2%	# 414
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Number of params	11M	# 485
Image Classification	ImageNet	TinyViT-11M-distill (21k)	GFLOPs	2.0	# 146
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Top 1 Accuracy	84.8%	# 271
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Number of params	21M	# 546
Image Classification	ImageNet	TinyViT-21M-distill (21k)	GFLOPs	4.3	# 202
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Top 1 Accuracy	86.2%	# 165
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Number of params	21M	# 546
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	GFLOPs	13.8	# 333
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Top 1 Accuracy	86.5%	# 136
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Number of params	21M	# 546
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	GFLOPs	27.0	# 386
Image Classification	ImageNet	TinyViT-5M	Top 1 Accuracy	79.1%	# 715
Image Classification	ImageNet	TinyViT-5M	Number of params	5.4M	# 419
Image Classification	ImageNet	TinyViT-5M	GFLOPs	1.3	# 118
Image Classification	ImageNet	TinyViT-11M	Top 1 Accuracy	81.5%	# 578
Image Classification	ImageNet	TinyViT-11M	Number of params	11M	# 485
Image Classification	ImageNet	TinyViT-11M	GFLOPs	2.0	# 146
Image Classification	ImageNet	TinyViT-21M	Top 1 Accuracy	83.1%	# 427
Image Classification	ImageNet	TinyViT-21M	Number of params	21M	# 546
Image Classification	ImageNet	TinyViT-21M	GFLOPs	4.3	# 202

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tinyvit-fast-pretraining-distillation-for/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=tinyvit-fast-pretraining-distillation-for)`

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

21 Jul 2022 · Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan ·

Vision transformer (ViT) recently has drawn great attention in computer vision due to its remarkable model capability. However, most prevailing ViT models suffer from huge number of parameters, restricting their applicability on devices with limited resources. To alleviate this issue, we propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. The central idea is to transfer knowledge from large pretrained models to small ones, while enabling small models to get the dividends of massive pretraining data. More specifically, we apply distillation during pretraining for knowledge transfer. The logits of large teacher models are sparsified and stored in disk in advance to save the memory cost and computation overheads. The tiny student transformers are automatically scaled down from a large pretrained model with computation and parameter constraints. Comprehensive experiments demonstrate the efficacy of TinyViT. It achieves a top-1 accuracy of 84.8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4.2 times fewer parameters. Moreover, increasing image resolutions, TinyViT can reach 86.5% accuracy, being slightly better than Swin-L while using only 11% parameters. Last but not the least, we demonstrate a good transfer ability of TinyViT on various downstream tasks. Code and models are available at https://github.com/microsoft/Cream/tree/main/TinyViT.

PDF Abstract

Code

Add Remove Mark official

microsoft/cream official

1,579

rwightman/pytorch-image-models

30,001

Tasks

Add Remove

Image Classification

Knowledge Distillation

Datasets

CIFAR-10

ImageNet

MS COCO

CIFAR-100

EuroSAT

Results from the Paper

Edit

Ranked #136 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Top 1 Accuracy	80.7%	# 630	Compare
			Number of params	5.4M	# 419	Compare
			GFLOPs	1.3	# 118	Compare
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Top 1 Accuracy	83.2%	# 414	Compare
			Number of params	11M	# 485	Compare
			GFLOPs	2.0	# 146	Compare
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Top 1 Accuracy	84.8%	# 271	Compare
			Number of params	21M	# 546	Compare
			GFLOPs	4.3	# 202	Compare
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Top 1 Accuracy	86.2%	# 165	Compare
			Number of params	21M	# 546	Compare
			GFLOPs	13.8	# 333	Compare
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Top 1 Accuracy	86.5%	# 136	Compare
			Number of params	21M	# 546	Compare
			GFLOPs	27.0	# 386	Compare
Image Classification	ImageNet	TinyViT-5M	Top 1 Accuracy	79.1%	# 715	Compare
			Number of params	5.4M	# 419	Compare
			GFLOPs	1.3	# 118	Compare
Image Classification	ImageNet	TinyViT-11M	Top 1 Accuracy	81.5%	# 578	Compare
			Number of params	11M	# 485	Compare
			GFLOPs	2.0	# 146	Compare
Image Classification	ImageNet	TinyViT-21M	Top 1 Accuracy	83.1%	# 427	Compare
			Number of params	21M	# 546	Compare
			GFLOPs	4.3	# 202	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove