Reconfigurable NoC and Processors Tolerant to Permanent Faults
Licentiatavhandling, 2015
Advances in semiconductor industry have led to reduced transistor dimensions and
increased device density, but inevitably they have compromised the reliability of modern
computing systems. In this thesis, we address the reliability problemby exploiting
hardware reconfiguration for tolerating permanent faults. Processing components in a
system-on-chip are divided into smaller Substitutable Units (SUs) and reconfigurable
interconnects are used to isolate defective SUs and connect spare units to create a
fault-free component. Furthermore, employing fine-grain logic for instantiating a functionally
equivalent unit is another reconfiguration option considered. Based on these
approaches, the first part of this thesis presents a probabilistic analysis of reconfigurable
designs for calculating the average number of constructable components at different
fault densities. Considering the area overheads of reconfigurability, we evaluate the
resilience of various reconfigurable designs with different granularities (SU sizes). Concisely,
the results reveal that the combination of fine and coarse-grain reconfiguration
offers up to 3£ more fault-tolerance compared to component redundancy. Performing
a design-space exploration to find the most efficient granularity mix shows that
different fault densities require different granularities of substitutable units to maximize
fault-tolerance. Moreover, we explored the performance effects of pipelining the
reconfigurable interconnects in adaptive processors and observed that the operating
frequency and execution time of pipelined design is roughly 2.5£ and 2£ better than the
design with non-pipelined interconnects, respectively. In the second part of this thesis,
we describe RQNoC, a service-oriented Network-on-Chip (NoC) resilient to permanent
faults. We characterize the network resources based on the particular service they support
and, when faulty, bypass them allowing the respective traffic class to be redirected.
We propose service merging (SMerge) and service detouring (SDetour) as the two service
redirection schemes. Different RQNoC configurations are implemented and evaluated
in terms of performance, area, power consumption and fault tolerance. Concisely, the
evaluation results show that compared to the baseline network, SMerge requires 51%
more area and 27% more power and has a 9% slower clock but maintains at least 90% of
the network connectivity even in presence of 32 permanent network faults.
Networks-on-Chip
Fault Tolerance
Permanent Faults
Reconfigurable Hardware
Adaptive processors