Review of "In Search of an Understandable Consensus Algorithm"

27 Oct 2015

Review of "In Search of an Understandable Consensus Algorithm"

Consensus algorithms allow a collection of machines to work as a coherent group that can survive failures of some members, and provides horizontal scalability. Paxos has dominated the discussion of consensus algorithms for the last decade. But Paxos is also known for its complexity and inability to be used as a good foundation for building practical implementations, thus, this paper rethink-ed the consensus algorithm and came up with a much simpler algorithm called Raft.

Normally, consensus algorithms ensures correctness under all non-Byzantine conditions, ensures availability as long as majority of the servers are functioning, are independent from timing and latency and ensures response as long as a majority of the cluster responded.

Three main novel features of Raft are 1) Strong leader, for example, log entries can only flow from the leader to other nodes. Which simplifies the management of replications and makes it easy to understand. 2) Randomized timers based leader election, which only adds a small amount of mechanism to heartbeats while resolving conflicts simply and rapidly. 3) Membership changes, joint consensus approach is used for changing the set of servers in the cluster where the majorities of two different configurations overlap during transitions. Raft makes sure the following properties hold at all times. Election safety: at most one leader can be elected in a given term; Leader never overwrites or deletes entries in its log, it's append only; Log matching: if two logs contain an entry with the same index and term, the all the logs are identical in all entries up through the given index; Leader completeness: if a log entry is committed in a given term, then the entry will be present in the logs of the leaders for all higher-numbered terms; State machine safety: no two servers will apply different log entries for the same index.

Single leader makes Raft much easier to implement and understand, but I'm wondering will this be enough for systems who wants to scale-out writes? In which case, information must flow from followers to followers?

Will this paper be influential in 10 years? Maybe, it does bridge the gap between understandability of consensus algorithm and the increasing need of them by a lot. But I'm not sure about whether the restriction on information flow will become a bottle neck for low latency systems.