Guaranteeing Generalization via Measures of Information
Licentiate thesis, 2020

During the past decade, machine learning techniques have achieved impressive results in a number of domains. Many of the success stories have made use of deep neural networks, a class of functions that boasts high complexity. Classical results that mathematically guarantee that a learning algorithm generalizes, i.e., performs as well on unseen data as on training data, typically rely on bounding the complexity and expressiveness of the functions that are used. As a consequence of this, they yield overly pessimistic results when applied to modern machine learning algorithms, and fail to explain why they generalize.

This discrepancy between theoretical explanations and practical success has spurred a flurry of research activity into new generalization guarantees. For such guarantees to be applicable for relevant cases such as deep neural networks, they must rely on some other aspect of learning than the complexity of the function class. One avenue that is showing promise is to use methods from information theory. Since information-theoretic quantities are concerned with properties of different data distributions and relations between them, such an approach enables generalization guarantees that rely on the properties of learning algorithms and data distributions.

In this thesis, we first introduce a framework to derive information-theoretic guarantees for generalization. Specifically, we derive an exponential inequality that can be used to obtain generalization guarantees not only in the average sense, but also tail bounds for the PAC-Bayesian and single-draw scenarios. This approach leads to novel generalization guarantees and provides a unified method for deriving several known generalization bounds that were originally discovered through the use of a number of different proof techniques. Furthermore, we extend this exponential-inequality approach to the recently introduced random-subset setting, in which the training data is randomly selected from a larger set of available data samples.

One limitation of the proposed framework is that it can only be used to derive generalization guarantees with a so-called slow rate with respect to the size of the training set. In light of this, we derive another exponential inequality for the random-subset setting which allows for the derivation of generalization guarantees with fast rates with respect to the size of the training set. We show how to evaluate the generalization guarantees obtained through this inequality, as well as their slow-rate counterparts, for overparameterized neural networks trained on MNIST and Fashion-MNIST. Numerical results illustrate that, for some settings, these bounds predict the true generalization capability fairly well, essentially matching the best available bounds in the literature.

PAC-Bayes

statistical learning

information theory

Machine learning

neural networks.

generalization

Opponent: Benjamin Guedj, University College London and Inria, UK

Author

Fredrik Hellström

Chalmers, Electrical Engineering, Communication, Antennas and Optical Networks

Hellström, F., Durisi, G., Generalization Bounds via Information Density and Conditional Information Density

Hellström, F., Durisi, G., Nonvacuous Loss Bounds with Fast Rates for Neu- ral Networks via Conditional Information Measures

INNER: information theory of deep neural networks

Chalmers AI Research Centre (CHAIR), 2019-01-01 -- 2021-12-31.

Areas of Advance

Information and Communication Technology

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Bioinformatics (Computational Biology)

Probability Theory and Statistics

Mathematical Analysis

Publisher

Chalmers

Online

Opponent: Benjamin Guedj, University College London and Inria, UK

More information

Latest update

11/27/2020