Performance Analysis and Enhancements of Memory Systems for Multi-Chiplet NUMA Architectures
Licentiate thesis, 2025
While prior studies have thoroughly examined the yield and cost benefits of multi-chiplet chips, their performance relative to monolithic counterparts remains unexplored. This thesis delves into a comprehensive performance analysis of multi-chiplet systems, comparing them to traditional monolithic designs and evaluating their cost-performance trade-offs. While multi-chiplet systems can drastically reduce recurring engineering costs by nearly half, our analysis reveals that they may suffer performance losses of up to one-third compared to monolithic systems due to these NUMA-related overheads.
To address the performance overheads, this thesis introduces MEMPLEX, a novel memory system explicitly designed for multi-chiplet NUMA architectures. MEMPLEX combines data replication and migration strategies to optimize data placement and improve data locality within the multi-chiplet memory hierarchy. By allocating a portion of each memory node as a DRAM cache and enabling migration based on access patterns and memory traffic, MEMPLEX reduces the frequency of costly remote memory accesses, mitigates performance overheads, and delivers substantial energy savings. The evaluation on multi-programmed workloads from different benchmark suites demonstrated that, compared to a multi-chiplet system with NUMA-aware data placement and no support for DRAM caching or migration, MEMPLEX reduces remote memory traffic by 80%, leading to a significant 44% dynamic memory energy consumption. MEMPLEX also delivers up to 7% speedup (5% on average) when 1/16 of each HBM is dedicated for caching in a 4-chiplet system, with performance gains increasing up to 15% (10% on average) in 16-chiplet systems. Overall, this thesis provides insights into the design and optimization of multi-chiplet architectures, paving the way for scalable and efficient systems in the post-Moore's Law era.
Chiplets
Caching
Migration
Non-Uniform Memory Access
Author
Neethu Bal Mallya
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
Neethu Bal Mallya, Panagiotis Strikos, Bhavishya Goel, Ahsen Ejaz, and Ioannis Sourdis, “A Performance Analysis of Chiplet-Based Systems”, DATE 2025
Neethu Bal Mallya, Bhavishya Goel, and Ioannis Sourdis, “MEMPLEX: A Multi-Chiplet NUMA Architecture with Data Replication and Migration”
Principer för beräknande minnesenheter (PRIDE)
Swedish Foundation for Strategic Research (SSF) (DnrCHI19-0048), 2021-01-01 -- 2025-12-31.
Subject Categories (SSIF 2025)
Computer Systems
Publisher
Chalmers
EDIT-EA Lecture Hall, Rännvägen 6B, Chalmers
Opponent: Cristina Silvano, Politecnico di Milano, Italy