Channel Pruning Via Gradient Of Mutual Information For Light-Weight Convolutional Neural Networks

Published on Oct 1, 2020 in ICIP (International Conference on Image Processing)
· DOI :10.1109/ICIP40778.2020.9190803
Min Kyu Lee3
Estimated H-index: 3
(Inha University),
Seunghyun Lee5
Estimated H-index: 5
(Inha University)
+ 1 AuthorsByung Cheol Song16
Estimated H-index: 16
(Inha University)
Source
Abstract
Channel pruning for light-weighting networks is very effective in reducing memory footprint and computational cost. Many channel pruning methods assume that the magnitude of a particular element corresponding to each channel reflects the importance of the channel. Unfortunately, such an assumption does not always hold. To solve this problem, this paper proposes a new method to measure the importance of channels based on gradients of mutual information. The proposed method computes and measures gradients of mutual information during back-propagation by arranging a module capable of estimating mutual information. By using the measured statistics as the importance of the channel, less important channels can be removed. Finally, the fine-tuning enables robust performance restoration of the pruned model. Experimental results show that the proposed method provides better performance with smaller parameter sizes and FLOPs than the conventional schemes.
References19
Newest
Jun 15, 2019 in CVPR (Computer Vision and Pattern Recognition)
#1Yang He (UTS: University of Technology, Sydney)H-Index: 33
#2Ping Liu (UTS: University of Technology, Sydney)H-Index: 16
Last. Yi Yang (UTS: University of Technology, Sydney)H-Index: 148
view all 5 authors...
Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (...
Source
Jun 15, 2019 in CVPR (Computer Vision and Pattern Recognition)
#3Jian Zhang (SJTU: Shanghai Jiao Tong University)H-Index: 37
We propose a variational Bayesian scheme for pruning convolutional neural networks in channel level. This idea is motivated by the fact that deterministic value based pruning methods are inherently improper and unstable. In a nutshell, variational technique is introduced to estimate distribution of a newly proposed parameter, called channel saliency, based on this, redundant channels can be removed from model via a simple criterion. The advantages are two-fold: 1) Our method conducts channel pru...
Source
Jun 15, 2019 in CVPR (Computer Vision and Pattern Recognition)
#1Kuan Wang (MIT: Massachusetts Institute of Technology)H-Index: 9
#2Zhijian Liu (MIT: Massachusetts Institute of Technology)H-Index: 20
Last. Song Han (MIT: Massachusetts Institute of Technology)H-Index: 45
view all 5 authors...
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. There are plenty of s...
Source
Apr 10, 2019 in CVPR (Computer Vision and Pattern Recognition)
#1Wonpyo Park (POSTECH: Pohang University of Science and Technology)H-Index: 5
#2Dongju Kim (POSTECH: Pohang University of Science and Technology)H-Index: 2
Last. Minsu Cho (POSTECH: Pohang University of Science and Technology)H-Index: 34
view all 4 authors...
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and a...
Source
Sep 8, 2018 in ECCV (European Conference on Computer Vision)
Currently, the neural network architecture design is mostly guided by the indirect metric of computation complexity, i.e., FLOPs. However, the direct metric, e.g., speed, also depends on the other factors such as memory access cost and platform characterics. Thus, this work proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs. Based on a series of controlled experiments, this work derives several practical guidelines for efficient network design. According...
Source
Jun 18, 2018 in CVPR (Computer Vision and Pattern Recognition)
#1Peisong WangH-Index: 12
#2Qinghao Hu (CAS: Chinese Academy of Sciences)H-Index: 10
Last. Jian Cheng (CAS: Chinese Academy of Sciences)H-Index: 35
view all 6 authors...
Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning op...
Source
Jun 18, 2018 in CVPR (Computer Vision and Pattern Recognition)
#1Mark Sandler (Google)H-Index: 61
#2Andrew Howard (Google)H-Index: 22
Last. Liang-Chieh Chen (Google)H-Index: 37
view all 5 authors...
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an...
Source
#1Yamamoto Kohei (Oki Electric Industry)H-Index: 2
#2Kurato MaenoH-Index: 2
Compression techniques for deep neural networks are important for implementing them on small embedded devices. In particular, channel-pruning is a useful technique for realizing compact networks. However, many conventional methods require manual setting of compression ratios in each layer. It is difficult to analyze the relationships between all layers, especially for deeper models. To address these issues, we propose a simple channel-pruning technique based on attention statistics that enables ...
Jun 7, 2018 in ICML (International Conference on Machine Learning)
#1Mohamed Ishmael Belghazi (Facebook)H-Index: 5
#2Aristide Baratin (UdeM: Université de Montréal)H-Index: 19
Last. Devon Hjelm (Microsoft)H-Index: 5
view all 7 authors...
#1Jianbo Ye (PSU: Pennsylvania State University)H-Index: 17
#2Xin Lu (Adobe Systems)H-Index: 26
Last. James Z. Wang (PSU: Pennsylvania State University)H-Index: 60
view all 4 authors...
Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions on resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs), which does not critically rely on this assumption...
Cited By3
Newest
view all 4 authors...
Source
#2Rongzuo Guo (SICNU: Sichuan Normal University)
Last. Yongjun Xu (CAS: Chinese Academy of Sciences)H-Index: 18
view all 5 authors...
Channel pruning has demonstrated its effectiveness in compressing ConvNets. In many related arts, the importance of an output feature map is only determined by its associated filter. However, these methods ignore a small part of weights in the next layer which disappears as the feature map is removed. They ignore the phenomenon of weight dependency. Besides, many pruning methods use only one criterion for evaluation and find a sweet spot of pruning structure and accuracy in a trial-and-error fas...
Source
Channel pruning has demonstrated its effectiveness in compressing ConvNets. In many prior arts, the importance of an output feature map is only determined by its associated filter. However, these methods ignore a small part of weights in the next layer which disappear as the feature map is removed. They ignore the dependency of the weights, so that, a part of weights are pruned without being evaluated. In addition, many pruning methods use only one criterion for evaluation, and find a sweet-spot...
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.