State machine replication (SMR) solutions underpin many modern distributeddatabase systems. They provide redundancy through replication and ensure these systems remain available and consistent even in the face of failures. This Foundations and Trends article explores various SMR solutions built upon foundational consensus SMR protocols, such as Viewstamped Replication, Paxos, and Raft, and examines how they solve the core challenges of operation distribution and sequencing. It categorizes SMR designs into diverse architectural styles (including single-leader, sequential, leaderless, and multi-leader approaches) and details the latency and throughput trade-offs for each. The article also details specific protocol optimizations, such as flexible quorums, lease-based local reads, and hardware acceleration using RDMA or in-network programmable smart switches. Beyond theoretical concepts like the CAP theorem and the FLP impossibility, the article connects these algorithmic designs directly to their real-world deployments, including control-plane systems like ZooKeeper, NoSQL data stores like MongoDB, and NewSQL distributed databases like Google Spanner and CockroachDB. This comprehensive review provides system designers with a detailed taxonomy of SMR traits to navigate the landscape of fault-tolerant distributed replication.
Charapko et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: