Key points are not available for this paper at this time.
In this paper we implement several basic operating system primitives by using a "replace-add" operation, which can supersede the standard "test and set" and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors. We also present a hardware implementation of replace-add that permits multiple replace-adds to be processed nearly as efficiently as loads and stores. Moreover, the crucial special case of concurrent replace-adds updating the same variable is handled particularly well: If every processing element simultaneously addresses a replace-add at the same variable, all these requests are satisfied in the time required to process just one request.
Gottlieb et al. (Fri,) studied this question.