Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Published: Feb 15, 2016
Abstract
: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce deep compression, a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network...
Paper Details
Title
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Published Date
Feb 15, 2016
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.