Reconfigurable NoC and Processors Tolerant to Permanent Faults

Alirad Malek

Reconfigurable NoC and Processors Tolerant to Permanent Faults
Licentiate thesis, 2015

Advances in semiconductor industry have led to reduced transistor dimensions and increased device density, but inevitably they have compromised the reliability of modern computing systems. In this thesis, we address the reliability problemby exploiting hardware reconfiguration for tolerating permanent faults. Processing components in a system-on-chip are divided into smaller Substitutable Units (SUs) and reconfigurable interconnects are used to isolate defective SUs and connect spare units to create a fault-free component. Furthermore, employing fine-grain logic for instantiating a functionally equivalent unit is another reconfiguration option considered. Based on these approaches, the first part of this thesis presents a probabilistic analysis of reconfigurable designs for calculating the average number of constructable components at different fault densities. Considering the area overheads of reconfigurability, we evaluate the resilience of various reconfigurable designs with different granularities (SU sizes). Concisely, the results reveal that the combination of fine and coarse-grain reconfiguration offers up to 3£ more fault-tolerance compared to component redundancy. Performing a design-space exploration to find the most efficient granularity mix shows that different fault densities require different granularities of substitutable units to maximize fault-tolerance. Moreover, we explored the performance effects of pipelining the reconfigurable interconnects in adaptive processors and observed that the operating frequency and execution time of pipelined design is roughly 2.5£ and 2£ better than the design with non-pipelined interconnects, respectively. In the second part of this thesis, we describe RQNoC, a service-oriented Network-on-Chip (NoC) resilient to permanent faults. We characterize the network resources based on the particular service they support and, when faulty, bypass them allowing the respective traffic class to be redirected. We propose service merging (SMerge) and service detouring (SDetour) as the two service redirection schemes. Different RQNoC configurations are implemented and evaluated in terms of performance, area, power consumption and fault tolerance. Concisely, the evaluation results show that compared to the baseline network, SMerge requires 51% more area and 27% more power and has a 9% slower clock but maintains at least 90% of the network connectivity even in presence of 32 permanent network faults.

Networks-on-Chip

Fault Tolerance

Permanent Faults

Reconfigurable Hardware

Adaptive processors

Room EB, EDIT Building, Johanneberg campus, Chalmers university

Opponent: Prof.Dr. Axel Jantsch

Author

Alirad Malek

Chalmers, Computer Science and Engineering (Chalmers)

Other publications Research

Subject Categories (SSIF 2011)

Computer Systems

Room EB, EDIT Building, Johanneberg campus, Chalmers university

Opponent: Prof.Dr. Axel Jantsch

More information

Created

10/8/2017

Reconfigurable NoC and Processors Tolerant to Permanent Faults Licentiate thesis, 2015