Techniques to Cancel Execution Early to Improve Processor Efficiency
The evolution of computer systems to continuously improve execution efficiency has traditionally embraced various approaches across microprocessor generations. Unfortunately, contemporary processors still suffer from several inefficiencies although they offer an unprecedented level of computing capabilities. At the same time, the traditional approach of solely caring about performance is nowadays superseded with more critical and multi-dimensional constraints that include power consumption, scalability, security and reliability to mention a few. This dissertation aims to address the prevailing inefficiencies to improve processor efficiency. To this end, this dissertation contributes with a number of techniques so that processors offer better performance and higher energy efficiency.
The first contribution is a novel scheme that detects and eliminates execution of trivial operations, such as multiplication by ‘0’ or ‘1’, early to improve energy efficiency. The second contribution identifies the tradeoffs and relative efficiencies of two techniques that target program inefficiency in the forms of trivial computation and instruction reuse. The most important finding is that these techniques detect sets of instructions that are almost disjoint and thus may provide cumulative benefits if combined. The next set of contributions increases execution efficiency of memory instructions by cancelling memory accesses early. The third contribution is a novel scheme that leverages frequent value locality and establishes that a significant fraction of memory instructions reads the value ‘0’ from memory. This dissertation then contributes with another microarchitectural technique to take advantage of small value locality. The scheme exploits the observation that a substantial fraction of memory instructions manipulates small values — values that can be represented using typically a few bits. The proposed schemes store small values compactly to reduce architectural inefficiency and eliminate unnecessary memory accesses to reduce program inefficiency. The penultimate scheme utilizes the observation that a notable fraction of memory requests can be satisfied by the contents of register file to make the associated memory accesses unnecessary. Finally, this dissertation presents a novel unified scheme that employs a single structure to simultaneously target multiple forms of program inefficiency in memory instructions.
Experimental results show that the proposed schemes improve performance and energy efficiency of processors. The proposed techniques are in general non-speculative. Moreover, additional resource requirements and associated overhead of each scheme are moderately low. Consequently, the schemes that are proposed in this dissertation contribute to resource-efficient and complexity-effective processor design.
small value locality
register file cache
frequent value locality
Lecture room EC, ED&IT building, Hörsalsvägen 11, Chalmers University of Technology, Sweden
Opponent: Professor David J. Lilja, Fellow of the IEEE, Department of Electrical and Computer Engineering, The University of Minnesota, USA