Key points are not available for this paper at this time.
An analysis of the failure statistics of a commercially available fault-tolerant system shows that administration and software are the major contributors to failure. Various approaches to software fault tolerance are then discussed notably process-pairs, transactions and reliable storage. It is pointed out that faults in production software are often soft (transient) and that a transaction mechanism combined with persistent process-pairs provides fault-tolerant execution – the key to software fault-tolerance.
Jim Gray (Mon,) studied this question.