Guaranteeing Generalization via Measures of Information
This discrepancy between theoretical explanations and practical success has spurred a flurry of research activity into new generalization guarantees. For such guarantees to be applicable for relevant cases such as deep neural networks, they must rely on some other aspect of learning than the complexity of the function class. One avenue that is showing promise is to use methods from information theory. Since information-theoretic quantities are concerned with properties of different data distributions and relations between them, such an approach enables generalization guarantees that rely on the properties of learning algorithms and data distributions.
In this thesis, we first introduce a framework to derive information-theoretic guarantees for generalization. Specifically, we derive an exponential inequality that can be used to obtain generalization guarantees not only in the average sense, but also tail bounds for the PAC-Bayesian and single-draw scenarios. This approach leads to novel generalization guarantees and provides a unified method for deriving several known generalization bounds that were originally discovered through the use of a number of different proof techniques. Furthermore, we extend this exponential-inequality approach to the recently introduced random-subset setting, in which the training data is randomly selected from a larger set of available data samples.
One limitation of the proposed framework is that it can only be used to derive generalization guarantees with a so-called slow rate with respect to the size of the training set. In light of this, we derive another exponential inequality for the random-subset setting which allows for the derivation of generalization guarantees with fast rates with respect to the size of the training set. We show how to evaluate the generalization guarantees obtained through this inequality, as well as their slow-rate counterparts, for overparameterized neural networks trained on MNIST and Fashion-MNIST. Numerical results illustrate that, for some settings, these bounds predict the true generalization capability fairly well, essentially matching the best available bounds in the literature.
Chalmers, Elektroteknik, Kommunikations- och antennsystem, Kommunikationssystem
Hellström, F., Durisi, G., Generalization Bounds via Information Density and Conditional Information Density
Hellström, F., Durisi, G., Nonvacuous Loss Bounds with Fast Rates for Neu- ral Networks via Conditional Information Measures
INNER: information theory of deep neural networks
Chalmers AI-forskningscentrum (CHAIR), 2019-01-01 -- 2021-12-31.
Informations- och kommunikationsteknik
C3SE (Chalmers Centre for Computational Science and Engineering)
Sannolikhetsteori och statistik
Chalmers tekniska högskola
Opponent: Benjamin Guedj, University College London and Inria, UK