Guaranteeing Generalization via Measures of Information
Licentiate thesis, 2020
This discrepancy between theoretical explanations and practical success has spurred a flurry of research activity into new generalization guarantees. For such guarantees to be applicable for relevant cases such as deep neural networks, they must rely on some other aspect of learning than the complexity of the function class. One avenue that is showing promise is to use methods from information theory. Since information-theoretic quantities are concerned with properties of different data distributions and relations between them, such an approach enables generalization guarantees that rely on the properties of learning algorithms and data distributions.
In this thesis, we first introduce a framework to derive information-theoretic guarantees for generalization. Specifically, we derive an exponential inequality that can be used to obtain generalization guarantees not only in the average sense, but also tail bounds for the PAC-Bayesian and single-draw scenarios. This approach leads to novel generalization guarantees and provides a unified method for deriving several known generalization bounds that were originally discovered through the use of a number of different proof techniques. Furthermore, we extend this exponential-inequality approach to the recently introduced random-subset setting, in which the training data is randomly selected from a larger set of available data samples.
One limitation of the proposed framework is that it can only be used to derive generalization guarantees with a so-called slow rate with respect to the size of the training set. In light of this, we derive another exponential inequality for the random-subset setting which allows for the derivation of generalization guarantees with fast rates with respect to the size of the training set. We show how to evaluate the generalization guarantees obtained through this inequality, as well as their slow-rate counterparts, for overparameterized neural networks trained on MNIST and Fashion-MNIST. Numerical results illustrate that, for some settings, these bounds predict the true generalization capability fairly well, essentially matching the best available bounds in the literature.
PAC-Bayes
statistical learning
information theory
Machine learning
neural networks.
generalization
Author
Fredrik Hellström
Chalmers, Electrical Engineering, Communication, Antennas and Optical Networks
Hellström, F., Durisi, G., Generalization Bounds via Information Density and Conditional Information Density
Hellström, F., Durisi, G., Nonvacuous Loss Bounds with Fast Rates for Neu- ral Networks via Conditional Information Measures
INNER: information theory of deep neural networks
Chalmers AI Research Centre (CHAIR), 2019-01-01 -- 2021-12-31.
Areas of Advance
Information and Communication Technology
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)
Subject Categories
Bioinformatics (Computational Biology)
Probability Theory and Statistics
Mathematical Analysis
Publisher
Chalmers
Opponent: Benjamin Guedj, University College London and Inria, UK