Reducing the performance overhead of resilient CMPs with substitutable resources
Paper in proceedings, 2015

Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline.

Author

Alirad Malek

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Stavros Tzilis

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Danish Anis Khan

Chalmers, Computer Science and Engineering (Chalmers)

Ioannis Sourdis

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

G. Smaragdos

Erasmus University Medical Center

C. Strydis

Erasmus University Medical Center

Proceedings of the 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFTS 2015

191-196

Subject Categories

Computer Engineering

Computer and Information Science

Areas of Advance

Information and Communication Technology

DOI

10.1109/DFT.2015.7315161

ISBN

978-1-5090-0312-9

More information

Created

10/8/2017