Filter Pruning and Re-Initialization via Latent Space Clustering

Published on Oct 14, 2020in IEEE Access3.367
· DOI :10.1109/ACCESS.2020.3031031
Seunghyun Lee5
Estimated H-index: 5
,
Byeongho Heo11
Estimated H-index: 11
(Naver Corporation)
+ 1 AuthorsByung Cheol Song16
Estimated H-index: 16
Sources
Abstract
Filter pruning is prevalent for pruning-based model compression. Most filter pruning methods have two main issues: 1) the pruned network capability depends on that of source pretrained models, and 2) they do not consider that filter weights follow a normal distribution. To address these issues, we propose a new pruning method employing both weight re-initialization and latent space clustering. For latent space clustering, we define filters and their feature maps as vertices and edges to be a graph, transformed into a latent space by graph convolution, alleviating to prune zero-near weight filters only. In addition, a part of filters is re-initialized with a constraint for enhancing filter diversity, and thus the pruned model is less dependent on the source network. This approach provides more robust accuracy even when pruned from the pretrained model with low accuracy. Extensive experimental results show our method decreases 56.6% and 84.6% of FLOPs and parameters of VGG16 with negligible loss of accuracy on CIFAR100, which is the state-of-the art performance. Furthermore, our method presents outperforming or comparable pruning results against state-of-the-art models on multiple datasets.
References42
Newest
#1Mathilde Caron (Facebook)H-Index: 10
#2Ari S. Morcos (Facebook)H-Index: 22
Last. Armand Joulin (Facebook)H-Index: 56
view all 5 authors...
Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on a task while trying to maintain the performance of the pruned network on the same task. However, in ...
#1Alex LabachH-Index: 2
#2Shahrokh Valaee (U of T: University of Toronto)H-Index: 42
Neural network pruning is an important technique for creating efficient machine learning models that can run on edge devices. We propose a new, highly flexible approach to neural network pruning based on Gibbs distributions. We apply it with Hamiltonians that are based on weight magnitude, using the annealing capabilities of Gibbs distributions to smoothly move from regularization to adaptive pruning during an ordinary neural network training schedule. This method can be used for either unstruct...
#1Gihun Lee (KAIST)H-Index: 2
#2Sangmin Bae (KAIST)H-Index: 1
Last. Se-Young Yun (KAIST)H-Index: 12
view all 4 authors...
With the success of deep learning in various fields and the advent of numerous Internet of Things (IoT) devices, it is essential to lighten models suitable for low-power devices. In keeping with this trend, MicroNet Challenge, which is the challenge to build efficient models from the view of both storage and computation, was hosted at NeurIPS 2019. To develop efficient models through this challenge, we propose a framework, coined as SIPA, consisting of four stages: Searching, Improving, Pruning,...
Source
#1Rahul Duggal (Georgia Institute of Technology)H-Index: 9
#2Cao XiaoH-Index: 23
Last. Jimeng Sun (Georgia Institute of Technology)H-Index: 75
view all 4 authors...
We propose Cluster Pruning (CUP) for compressing and accelerating deep neural networks. Our approach prunes similar filters by clustering them based on features derived from both the incoming and outgoing weight connections. With CUP, we overcome two limitations of prior work-(1) non-uniform pruning: CUP can efficiently determine the ideal number of filters to prune in each layer of a neural network. This is in contrast to prior methods that either prune all layers uniformly or otherwise use res...
Oct 1, 2019 in ICCV (International Conference on Computer Vision)
In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure given the target network. We use a simple stochastic structure sampling method for training the PruningNet. Then, we apply an evolutionary procedure to search for good-performing pruned networks. The search is highly efficient because the weights are directly g...
Source
Oct 1, 2019 in ICCV (International Conference on Computer Vision)
#1Jiwoong Park (SNU: Seoul National University)H-Index: 4
#2Minsik Lee (Hanyang University)H-Index: 12
Last. Jin Young Choi (SNU: Seoul National University)H-Index: 31
view all 5 authors...
We propose a symmetric graph convolutional autoencoder which produces a low-dimensional latent representation from a graph. In contrast to the existing graph autoencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. For the reconstruction of node features, the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing of the encoder, which allows utilizing the graph stru...
Source
Jul 17, 2019 in AAAI (National Conference on Artificial Intelligence)
#1Esteban Real (Google)H-Index: 13
#2Alok Aggarwal (Google)H-Index: 3
Last. Quoc V. Le (Google)H-Index: 102
view all 4 authors...
The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier— AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introduc...
Source
Jul 17, 2019 in AAAI (National Conference on Artificial Intelligence)
#1Byeongho Heo (SNU: Seoul National University)H-Index: 11
#2Minsik Lee (Hanyang University)H-Index: 12
Last. Jin Young Choi (SNU: Seoul National University)H-Index: 31
view all 4 authors...
An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classificationfriendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we pr...
Source
Jun 15, 2019 in CVPR (Computer Vision and Pattern Recognition)
#1Yang He (UTS: University of Technology, Sydney)H-Index: 33
#2Ping Liu (UTS: University of Technology, Sydney)H-Index: 16
Last. Yi Yang (UTS: University of Technology, Sydney)H-Index: 148
view all 5 authors...
Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (...
Source
Jun 15, 2019 in CVPR (Computer Vision and Pattern Recognition)
#3Jian Zhang (SJTU: Shanghai Jiao Tong University)H-Index: 37
We propose a variational Bayesian scheme for pruning convolutional neural networks in channel level. This idea is motivated by the fact that deterministic value based pruning methods are inherently improper and unstable. In a nutshell, variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on this, redundant channels can be removed from model via a simple criterion. The advantages are two-fold: 1) Our method conducts channel pru...
Source
Cited By0
Newest
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.