Asymptotic Analysis of Machine Learning Models: Comparison Theorems and Universality
Doctoral thesis, 2025
This thesis investigates the asymptotic regime of machine learning models - a regime in which both the number of trainable parameters (model size) and the number of data points grow infinitely at a fixed ratio. Understanding model behavior in this limit provides valuable theoretical insights into model statistics such as training error and generalization error, particularly in high-dimensional settings relevant to contemporary machine learning practice.
The core methodological tools used throughout this work are Gaussian comparison theorems, with a special emphasis on the Convex Gaussian Min-max Theorem (CGMT). These theorems enable the rigorous analysis of complex learning algorithms by comparing them to alternative surrogate problems, which are simpler to analyze. By constructing such asymptotically equivalent optimization problems, we are able to derive characterizations of the models of interest by proxy.
A secondary but significant theme in this thesis is the concept of universality in the asymptotic regime. Universality results demonstrate that many statistical properties of machine learning models are asymptotically governed only by low-order moments (e.g., means and variances) of the data distribution, rather than its full structure. This insight justifies the use of Gaussian surrogate models that match these moments, making them amenable to analysis via Gaussian comparison tools.
CGMT
universality
Convex Gaussian MIn Max Theorem.
asymptotics
Author
David Bosch
Data Science and AI 3
A Novel Gaussian Min-Max Theorem and its Applications
IEEE Transactions on Information Theory,;Vol. In Press(2025)
Journal article
A Novel Convex Gaussian Min Max Theorem for Repeated Features
Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022,;Vol. 258(2025)p. 3673-3681
Paper in proceeding
Random Features Model with General Convex Regularization: A Fine Grained Analysis with Precise Asymptotic Learning Curves
Proceedings of Machine Learning Research,;Vol. 206(2023)p. 11371-11414
Paper in proceeding
Precise Asymptotic Analysis of Deep Random Feature Models
Proceedings of Machine Learning Research,;Vol. 195(2023)p. 4132-4179
Paper in proceeding
Double Descent in Feature Selection: Revisiting LASSO and Basis Pursuit
Thirty-eighth International Conference on Machine Learning, ICML 2021,;(2021)
Paper in proceeding
Subject Categories (SSIF 2025)
Probability Theory and Statistics
Artificial Intelligence
ISBN
978-91-8103-287-1
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5745
Publisher
Chalmers