Revisit Kernel Pruning with Lottery Regulated Grouped Convolutions

Structured pruning methods which are capable of delivering a densely pruned network are among the most popular techniques in the realm of neural network pruning, where most methods prune the original network at a filter or layer level. Although such methods may provide immediate compression and acceleration benefits, we argue that the blanket removal of an entire filter or layer may result in undesired accuracy loss. In this paper, we revisit the idea of kernel pruning (to only prune one or several $k \times k$ kernels out of a 3D-filter), a heavily overlooked approach under the context of structured pruning due to it will naturally introduce sparsity to filters within the same convolutional layer—thus, making the remaining network no longer dense. We address this problem by proposing a versatile grouped pruning framework where we first cluster filters from each convolutional layer into equal-sized groups, prune the grouped kernels we deem unimportant from each filter group, then permute the remaining filters to form a densely grouped convolutional architecture (which also enables the parallel computing capability) for fine-tuning. Specifically, we consult empirical findings from a series of literature regarding $\textit{Lottery Ticket Hypothesis}$ to determine the optimal clustering scheme per layer, and develop a simple yet cost-efficient greedy approximation algorithm to determine which group kernels to keep within each filter group. Furthermore, we discuss how this pruning framework will potentially work with more advanced unsupervised clustering schemes and inductive biases on weights shifting discovered in the future. Extensive experiments also demonstrate our method often outperforms comparable SOTA methods with lesser data argumentation needed, smaller fine-tuning budget required, and sometimes even with much simpler procedure executed (e.g., one-shot pruning v. iterative pruning, standard fine-tuning v. custom fine-tuning).

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods