Involution: Inverting the Inherence of Convolution for Visual Recognition

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including ImageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at https://github.com/d-li14/involution.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification ImageNet RedNet-38 Top 1 Accuracy 77.6% # 800
Number of params 12.4M # 502
GFLOPs 2.2 # 154
Image Classification ImageNet RedNet-26 Top 1 Accuracy 75.9% # 857
Number of params 9.2M # 468
GFLOPs 1.7 # 135
Image Classification ImageNet RedNet-101 Top 1 Accuracy 79.1% # 714
Number of params 25.6M # 601
GFLOPs 4.7 # 220
Image Classification ImageNet RedNet-50 Top 1 Accuracy 78.4% # 768
Number of params 15.5M # 517
GFLOPs 2.7 # 167
Image Classification ImageNet RedNet-152 Top 1 Accuracy 79.3% # 706
Number of params 34M # 657
GFLOPs 6.8 # 246

Methods