Guaranteeing Generalization via Measures of Information
Licentiatavhandling, 2020

During the past decade, machine learning techniques have achieved impressive results in a number of domains. Many of the success stories have made use of deep neural networks, a class of functions that boasts high complexity. Classical results that mathematically guarantee that a learning algorithm generalizes, i.e., performs as well on unseen data as on training data, typically rely on bounding the complexity and expressiveness of the functions that are used. As a consequence of this, they yield overly pessimistic results when applied to modern machine learning algorithms, and fail to explain why they generalize.

This discrepancy between theoretical explanations and practical success has spurred a flurry of research activity into new generalization guarantees. For such guarantees to be applicable for relevant cases such as deep neural networks, they must rely on some other aspect of learning than the complexity of the function class. One avenue that is showing promise is to use methods from information theory. Since information-theoretic quantities are concerned with properties of different data distributions and relations between them, such an approach enables generalization guarantees that rely on the properties of learning algorithms and data distributions.

In this thesis, we first introduce a framework to derive information-theoretic guarantees for generalization. Specifically, we derive an exponential inequality that can be used to obtain generalization guarantees not only in the average sense, but also tail bounds for the PAC-Bayesian and single-draw scenarios. This approach leads to novel generalization guarantees and provides a unified method for deriving several known generalization bounds that were originally discovered through the use of a number of different proof techniques. Furthermore, we extend this exponential-inequality approach to the recently introduced random-subset setting, in which the training data is randomly selected from a larger set of available data samples.

One limitation of the proposed framework is that it can only be used to derive generalization guarantees with a so-called slow rate with respect to the size of the training set. In light of this, we derive another exponential inequality for the random-subset setting which allows for the derivation of generalization guarantees with fast rates with respect to the size of the training set. We show how to evaluate the generalization guarantees obtained through this inequality, as well as their slow-rate counterparts, for overparameterized neural networks trained on MNIST and Fashion-MNIST. Numerical results illustrate that, for some settings, these bounds predict the true generalization capability fairly well, essentially matching the best available bounds in the literature.

PAC-Bayes

statistical learning

information theory

Machine learning

neural networks.

generalization

Opponent: Benjamin Guedj, University College London and Inria, UK

Författare

Fredrik Hellström

Chalmers, Elektroteknik, Kommunikation, Antenner och Optiska Nätverk

Hellström, F., Durisi, G., Generalization Bounds via Information Density and Conditional Information Density

Hellström, F., Durisi, G., Nonvacuous Loss Bounds with Fast Rates for Neu- ral Networks via Conditional Information Measures

INNER: information theory of deep neural networks

Chalmers AI-forskningscentrum (CHAIR), 2019-01-01 -- 2021-12-31.

Styrkeområden

Informations- och kommunikationsteknik

Infrastruktur

C3SE (Chalmers Centre for Computational Science and Engineering)

Ämneskategorier

Bioinformatik (beräkningsbiologi)

Sannolikhetsteori och statistik

Matematisk analys

Utgivare

Chalmers

Online

Opponent: Benjamin Guedj, University College London and Inria, UK

Mer information

Senast uppdaterat

2020-11-27