High Performance Hybrid Memory Systems with 3D-stacked DRAM
Licentiate thesis, 2019

The bandwidth of traditional DRAM is pin limited and so does not scale well
with the increasing demand of data intensive workloads limiting performance.
3D-stacked DRAM can alleviate this problem providing substantially higher
bandwidth to a processor chip. However, the capacity of 3D-stacked DRAM is
not enough to replace the bulk of the memory and therefore it is used either
as a DRAM cache or as part of a flat address space with support for data
migration. The performance of both above alternative designs is limited by
their particular overheads. In this thesis we propose designs that improve
the performance of hybrid memory systems in which 3D-stacked DRAM is
used either as a cache or as part of a flat address space with data migration.
DRAM caches have shown excellent potential in capturing the spatial and
temporal data locality of applications, however they are still far from their ideal
performance. Besides the unavoidable DRAM access to fetch the requested
data, tag access is in the critical path adding significant latency and energy
costs. Existing approaches are not able to remove these overheads and in
some cases limit DRAM cache design options. To alleviate the tag access
overheads of DRAM caches this thesis proposes Decoupled Fused Cache (DFC),
a DRAM cache design that fuses DRAM cache tags with the tags of the on-chip
Last Level Cache (LLC) to access the DRAM cache data directly on LLC
misses. Compared to current state-of-the-art DRAM caches, DFC improves
system performance by 6% on average and by 16-18% for large cacheline sizes.
Finally, DFC reduces DRAM cache traffic by 18% and DRAM cache energy
consumption by 7%. Data migration schemes have significant performance
potential, but also entail overheads, which may diminish migration benefits
or even lead to performance degradation. These overheads are mainly due to
the high cost of swapping data between memories which also makes selecting
which data to migrate critical to performance. To address these challenges
of data migration this thesis proposes LLC guided Data Migration (LGM).
LGM uses the LLC to predict future reuse and select memory segments for
migration. Furtermore, LGM reduces the data migration traffic overheads by
not migrating the cache lines of memory segments which are present in the
LLC. LGM outperforms current state-of-the art migration designs improving
system performance by 12.1% and reducing memory system dynamic energy
by 13.2%.

Hybrid memory systems

DRAM caches

Data migration

3D-stacked DRAM

Room EA, EDIT building, Rännvägen 6, Chalmers University of Technology, Campus Johanneberg
Opponent: Prof. Yale Patt University of Texas at Austin, U.S.A

Author

Evangelos Vasilakis

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Decoupled fused cache: Fusing a decoupled LLC with a DRAM cache

Transactions on Architecture and Code Optimization,;Vol. 15(2019)

Journal article

Subject Categories (SSIF 2011)

Computer Engineering

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Publisher

Chalmers

Room EA, EDIT building, Rännvägen 6, Chalmers University of Technology, Campus Johanneberg

Opponent: Prof. Yale Patt University of Texas at Austin, U.S.A

More information

Latest update

5/10/2019