A Process Health Status Service for Safety Related Systems Using TT/ET Communication Scheduling
Paper in proceedings, 2008

This paper describes a health status protocol for distributed real-time systems that use TTCAN, Flexray, or other networks which support both time-triggered and event-triggered communication. The protocol allows a group of co-operating processes to establish a consistent view of each other’s health status over time. It extends the instantaneous view, of operational status of each process, provided by a process group membership protocol. The health status and membership protocols are intended for systems where processes (not nodes) are considered the smallest unit of failure, and where process failures can be detected and recovered locally by the host node. Such systems require a decision function that determines whether a process failure is temporary (the process is being recovered by the host node) or permanent (local recovery is not possible or was unsuccessful). Our protocol ensures that such decisions are made consistently among correct nodes despite symmetrical and asymmetrical omission failures.

fault tolerance

redundancy management

diagnosis

memberhsip protocols

Author

Carl Bergenhem

Johan Karlsson

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Proc. IEEE 14th Pacific Rim International Symposium on Dependable Computing (PRDC 2008)

122-131

Subject Categories

Computer Engineering

Software Engineering

ISBN

978-0-7695-3448-0

More information

Created

10/6/2017