#
Slope and generalization properties of neural networks
Preprint, 2021

From a statistical point of view, fitting a neural network may be seen as a kind of

regression, where we seek a function from the input space to a space of classification

probabilities that follows the "general" shape of the data, but avoids overfitting by

avoiding memorization of individual data points. In statistics, this can be done by

controlling the geometric complexity of the regression function. We propose to

do something similar when fitting neural networks by controlling the slope of the

network.

After defining the slope and discussing some of its theoretical properties, we go

on to show empirically in examples, using ReLU networks, that the distribution

of the slope of a well-trained neural network classifier is generally independent

of the width of the layers in a fully connected network, and that the mean of the

distribution only has a weak dependence on the model architecture in general. The

slope is of similar size throughout the relevant volume, and varies smoothly. It also

behaves as predicted in rescaling examples. We discuss possible applications of the

slope concept, such as using it as a part of the loss function or stopping criterion

during network training, or ranking data sets in terms of their complexity.

## Author

### Anton Johansson

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

### Niklas Engsner

Chalmers, Computer Science and Engineering (Chalmers), Data Science

### Claes Strannegård

Data Science and AI 1

### Petter Mostad

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

### Subject Categories

Other Computer and Information Science

Communication Systems

Bioinformatics (Computational Biology)