An Information Theoretic View on Learning of Artificial Neural Networks
Authors
Abstract
Deep learning based on Artificial Neural Networks (ANNs) has achieved great successes over the last years. However, gaining insight into the fundamentals and explaining their functionality is an open research area of high interest. In this paper, we use an information theoretic approach to reveal typical learning patterns of ANNs. For this purpose the training samples, the true labels, and the estimated labels are considered as random variables. Then, the mutual information and conditional entropy between these variables are studied. We show that the learning process of ANNs consists of essentially two phases.
First, the network learns mostly about the input samples without significant improvement in the accuracy, thereafter the correct class allocation becomes more pronounced. This is based on investigating the conditional entropy of the estimated class label given the true one in the course of training. We next derive bounds on the conditional entropy as a function of the error probability, which provide interesting insights into the learning behavior of ANNs. Theoretical investigations are accompanied by extensive numerical studies on an artificial data set as well as the MNIST and CIFAR benchmark data using the widely known networks LeNet-5 and DenseNet. Amazingly, in all cases the bounds are nearly attained in later stages of the training phase, which allows for an analytical measure of the training status of an ANN.
Index Terms
Neural networks, Fano’s inequality, machine learning
BibTEX Reference Entry
@inproceedings{BaBeMa18c, author = {Emilio Rafael Balda and Arash Behboodi and Rudolf Mathar}, title = "An Information Theoretic View on Learning of Artificial Neural Networks", pages = "1-8", booktitle = "12th International Conference on Signal Processing and Communication Systems, ICSPCS 2018", address = {Cairns, Australia}, month = Dec, year = 2018, hsb = RWTH-2018-231971, }
Downloads
Download paper Download bibtex-file
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights there in are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.