Debloating Machine Learning Systems
Doktorsavhandling, 2025

In recent decades, Machine Learning (ML) has rapidly evolved from an academic pursuit into a cornerstone of modern industries, with applications spanning manufacturing, healthcare, finance, transportation, etc.. The emergence of large language models (LLMs) has further accelerated this growth, driving unprecedented demand for ML systems capable of supporting models of widely varying scales and data modalities. Despite the growing importance of ML systems, the problem of software bloat within them remains underexplored. Software bloat usually refers to unnecessary code, files, or dependencies. It degrades performance, increases resource consumption, and introduces security risks. With the rise of containerized applications as a common deployment method for ML, the problem is further exacerbated: containers must package not only applications but also all associated libraries and dependencies, leading to significant overhead. This thesis investigates bloat in ML systems across multiple layers, from ML containers to ML shared libraries. This thesis introduces novel techniques to measure, analyze, and mitigate bloat in ML systems. The main contributions are: (A) MMLB: a framework for measuring ML bloat that quantifies container-level bloat and identifies its causes, showing that ML containers are substantially more bloated than general-purpose containers and ML shared libraries are the major source of bloat. (B) BLAFS: a bloat-aware file system that efficiently and effectively reduces file bloat in ML containers by detecting and removing unused files. (C) RTrace: a tracer that accurately identifies functions executed in shared libraries, improving the visibility of shared library execution and enabling precise detection of host-code bloat. (D) MERGESHUFFLE: a tool that removes unused code from shared libraries while preserving functionality and improving performance and security. (E) Negative-ML: This work reveals that ML shared libraries are different from generic libraries: while the latter contain only host code, ML shared libraries include both host code and device code, the latter targeting GPUs and significantly contributing to their size. Negative-ML presents a holistic debloating approach that targets both host and device code, representing the first systematic investigation of device-code bloat. This thesis thus offers both a systematic understanding of software bloat in ML systems and practical techniques to mitigate it, contributing to more efficient, secure, and sustainable deployment of ML in real-world environments.

Software Bloat

Software Engineering

Machine Learning Systems

Software Debloating

Performance Optimization

Chalmers Johannesburg campus, EDIT 6217

Författare

Huaifeng Zhang

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

Machine learning systems are bloated and vulnerable

SIGMETRICS/PERFORMANCE 2024 - Abstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems,;(2024)p. 37-38

Paper i proceeding

RTrace: Towards Better Visibility of Shared Library Execution

33rd Network and Distributed System Security (NDSS) Symposium 2026, NDSS 2026,;(2025)

Paper i proceeding

BLAFS: A Bloat-Aware Container File System

Proceedings of the 2025 ACM Symposium on Cloud Computing,;(2025)

Paper i proceeding

The Hidden Bloat in Machine Learning Systems

Proceedings of the 8th Conference on Machine Learning and Systems (MLSys, Best Paper Award),;(2025)

Paper i proceeding

MERGESHUFFLE: Debloating Shared Libraries for Improved Perfor- mance and Security

How was the famous statue David created? When asked, Michelangelo replied, “It’s simple. I just remove what is not David.” This thesis follows a similar philosophy, but instead of marble, it works with software. 

Rather than adding new features, this thesis focuses on removing what is unnecessary. In software, this excess is called software bloat - unnecessary code and features in software. Such bloat wastes resources, increases energy consumption, and slows down performance. The process of removing this bloat is called debloating. 

This thesis applies debloating to machine learning systems, which is the software at the heart of modern Artificial Intelligence (AI). By analyzing which parts of these systems are truly used under real workloads, this thesis introduces methods to identify and remove unused components, ranging from large software modules down to individual code instructions. The result is a leaner, faster, and more energy-efficient system.

Ämneskategorier (SSIF 2025)

Programvaruteknik

Styrkeområden

Informations- och kommunikationsteknik

Drivkrafter

Hållbar utveckling

Infrastruktur

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

DOI

10.63959/chalmers.dt/5769

ISBN

978-91-8103-312-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5769

Utgivare

Chalmers

Chalmers Johannesburg campus, EDIT 6217

Online

Mer information

Senast uppdaterat

2025-10-20