Software modification aided transient error tolerance for embedded systems
Paper in proceedings, 2013
Commercial off-the-shelf (COTS) components are increasingly being employed in embedded systems due to their high performance at low cost. With emerging reliability requirements, design of these components using traditional hardware redundancy incur large overheads, time-demanding re-design and validation. To reduce the design time with shorter time-to-market requirements, software-only reliable design techniques can provide with an effective and low-cost alternative. This paper presents a novel, architecture-independent software modification tool, SMART (Software Modification Aided transient eRror Tolerance) for effective error detection and tolerance. To detect transient errors in processor data path, control flow and memory at reasonable system overheads, the tool incorporates selective and non-intrusive data duplication and dynamic signature comparison. Also, to mitigate the impact of the detected errors, it facilitates further software modification implementing software-based check-pointing. Due to automatic software based source-to-source modification tailored to a given reliability requirement, the tool requires no re-design effort, hardware- or compiler-level intervention. We evaluate the effectiveness of the tool using a Xentium processor based system as a case study of COTS based systems. Using various benchmark applications with single-event upset (SEUs) based error model, we show that up to 91% of the errors can be detected or masked with reasonable performance, energy and memory footprint overheads. © 2013 IEEE.