Review of "ZooKeeper: Wait-free coordination for Internet-scale systems"

28 Oct 2015

Review of "ZooKeeper: Wait-free coordination for Internet-scale systems"

ZooKeeper is a service for coordinating processes of distributed applications. It provides a simple and high performance kernel for building complex coordination primitives at the client. It incorporates group messaging, shared registers, and distributed locks in a replicated, centralized service.

ZooKeeper provides to the the client an abstraction of a set of data nodes (znodes), organized in a similar way to unix file systems. It can be used to store meta-data of client applications used for coordination purpose.

The interface exposed by ZooKeeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems. It also guarantees a per client FIFO execution of requests and linearizability for all requests that change its state. As an example of how these guarantees work for applications, let's see how a leader transition works. With ZooKeeper, the new leader cna designate a path as the ready znode; other processes will only use the configuration when the znode exists. The new leader starts changing the configuration by deleting this ready node, then updates configurations and then recreate the ready node. The wait-free guarantees makes the configuration change parallel, which is much faster than single thread. The FIFO guarantees that even though these operations requests are issued asynchronously, worker nodes will not see ready znode before all the configuration changes.

Very interestingly, ZooKeeper uses a fuzzy snapshot scheme, in which it doesn't lock the data structure, instead just do a traversal and take whatever state is on that structure. The reason why it can do this is state changes are idempotent, so they can be applied twice and the final state will still be the same. This snapshot is used to speeding up message reply when recovering from a crash.

Will this paper be influential in 10 years? I think so. It solves a tricky issue of distributed systems – how does processes coordinate with each other. And it has gained a lot of adoption in the industry.