Towards Accurate and Resource-Efficient Cache Coherence Prediction
Doktorsavhandling, 2003

The increasing speed gap between processor microarchitectures and memory technologies can potentially slow down the historical performance growth of computer systems. Parallel applicationns on shared memory multiprocessors that experience cache misses due to communication are extra susceptible to this speed difference. Prediction has been proposed as a general technique in computer architecture to anticipate and speculate about events in the processor and in the memory system. Specific prediction techniques have been successfully used in different parts of the processor, e.g. branch outcome prediction, branch target prediction, and prefetching. Coherence message prediction has recently been proposed as a means to reduce the impact of long-latency memory operations in multiprocessors. However, this prediction can be very resource consuming and ineffective. This thesis addresses the resource inefficiency as well as the inaccuracy of proposed coherence predictiors. It proposes novel prediction mechanisms with improved accuracy and/or with lower resource requirements. One important finding of the thesis is that specialized coherence prediction techniques to improve write-related overhead are only moderately effective for a transaction processing workload. I show that improvements are possible through the notion of load-store sequences - a new data sharing model I contribute with in the thesis. In the area of generalized coherence message prediction I contribute with a novel caching scheme for history tables that cuts down the static overhead by an order of magnitude. In-depth analyses of the dynamic memory requirements of generalized coherence predictors, as well as their learning time effects are also presented, with suggestions on how to reduce the dynamic overhead while increasing the accuracy of coherence predictors.

cache coherence protocols

performance evaluation

computer architecture

memory systems

shared memory multiprocessors


Jim Nilsson

Chalmers, Institutionen för datorteknik


Data- och informationsvetenskap