Key points are not available for this paper at this time.
What makes a distributed system reliable? A study of failures in the US public switched telephone network (PSTN) shows that human intervention is one key to this large system's reliability. Software is not the weak link in the PSTN system's dependability. Extensive use of built-in self-test and recovery mechanisms in major system components (switches) contributed to software dependability and are significant design features in the PSTN. The network's high dependability indicates that the trade-off between dependability gains and complexity introduced by built-in self-test and recovery mechanisms can be positive. Likewise, the tradeoff between complex interactions and the loose coupling of system components has been positive, permitting quick human intervention in most system failures and resulting in an extremely reliable system.
D. Richard Kuhn (Tue,) studied this question.