Variational Network Quantization


J. Achterhold, J. M. Koehler, A. Schmeink, T. Genewein,


        In this paper, the preparation of a neural network for pruning and few-bit quantization is formulated as a variational inference problem. To this end, a quantizing priorcthat leads to a multi-modal, sparse posterior distribution over weights, is in- troduced and a differentiable Kullback-Leibler divergence approximation for this prior is derived. After training with Variational Network Quantization, weights can be replaced by deterministic quantization values with small to negligible loss of task accuracy (including pruning by setting weights to 0). The method does not require fine tuning after quantization. Results are shown for ternary quantization on LeNet-5 (MNIST) and DenseNet (CIFAR-10).

BibTEX Reference Entry 

	author = {Jan Achterhold and Jan Mathias Koehler and Anke Schmeink and Tim Genewein},
	title = "Variational Network Quantization",
	pages = "1-18",
	booktitle = "International Conference on Learning Representations (ICLR)",
	address = {Vancouver},
	month = May,
	year = 2018,
	hsb = RWTH-2019-08795,


 Download paper  Download bibtex-file

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights there in are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.