Bifröst: Secure Networks for Distributed AI Training (II)
Research Project, 2026
–
Distributed AI has emerged as the standard approach to train sophisticated models requiring immense computational effort and extended training times. AI training infrastructure has evolved from standalone high-performance devices into large-scale clusters comprising thousands of interconnected units. These systems communicate through complex networks within and across data centers, and even over the Internet. This paradigm shift exposes distributed AI training to a new class of network threats that remain largely unexplored in the literature. This project envisions Bifröst, secure networks built for distributed AI training. Our objectives include (1) identifying novel network attacks against distributed AI training systems; (2) developing a measurement framework to assess the impact of such attacks on training performance, adversary costs, and model accuracy; and (3) designing a generalized, cross-layer defense framework that integrates network-level mitigation and application-level optimization, offering comprehensive protection for distributed AI training while preserving competitive performance.
The original proposal outlined two strategies for enabling large-scale distributed AI training: scaling dedicated devices within interconnected data centers, and crowd-sourcing computational power from commodity devices. From a networking perspective, this involves multiple network layers. At the smallest scale, each device contains an intra-host network linking processors, memory, and peripherals. Within a data center, thousands of such devices form the intra-data center network, while multiple data centers are connected via inter-data center networks (e.g., WANs). At the outermost layer, peer-to-peer networks link distributed commodity devices under the crowdsourcing paradigm.
In this extended description, we propose two complementary PhD projects, each focusing on different parts of this hierarchy: (I) one on intra-host and intra-data center networks, and (II) the other on inter-data center and peer-to-peer networks. While addressing different challenges and being significant in their own ways, the two projects still share the same objectives: identifying novel attacks, evaluating their impact, and developing cross-layer defenses. Inspired by Bifröst, these two efforts strengthen different segments of the “bridge”, together advancing secure and resilient networking for distributed AI training.
Participants
Muoi Tran (contact)
Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems
Romaric Duvignau
Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems
Shubham Saha
Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems
Funding
Wallenberg AI, Autonomous Systems and Software Program
Funding Chalmers participation during 2026–
Related Areas of Advance and Infrastructure
Information and Communication Technology
Areas of Advance
Chalmers e-Commons (incl. C3SE, 2020-)
Infrastructure