Scalable analysis of multicore data reuse and sharing
Paper i proceeding, 2014

The performance and energy efficiency of multicore systems are increasingly dominated by the costs of communication. As hardware parallelism grows, developers require more powerful tools to assess the data sharing and reuse properties of their algorithms. The reuse distance is an effective metric to study the temporal locality of programs and model private and shared caches. But the application of this method is challenging. First, generating memory traces is very expensive in storage and very intrusive on execution, possibly distorting the parallel schedule. And second, the algorithm is computationally very expensive, limiting the length, memory size and parallelism of analyzable programs. This paper introduces a novel coarse-grained reuse distance method, called Kernel Reuse Distance (KRD), which addresses these challenges. KRD enables a quick assessment of data locality by studying the reuse characteristics of the kernels' inputs and outputs. We analyze the performance of the initial prototype implementation and show two use cases comparing different parallel implementations. On a 24-core system, analyzing a trace from a matrix multiplication representing 24 threads, 1.37 terabytes of streamed data and 800 million distinct accesses, the parallel KRD implementation is able to compute the coherence-aware kernel reuse distance histogram for one socket (six cores) in 11.1 seconds. © 2014 ACM.

multithreaded runtime systems


reuse distance

data reuse and sharing


Miquel Pericas

Chalmers, Data- och informationsteknik, Datorteknik

K. Taura

University of Tokyo

S. Matsuoka

Tokyo Institute of Technology

Proceedings of the International Conference on Supercomputing



Data- och informationsvetenskap