Review of "CRDTs: Consistency without concurrency control"

25 Oct 2015

Review of "CRDTs: Consistency without concurrency control"

Sharing mutable data at a large scale has long been a difficult problem. Two approaches dominate in practice, one ensures scalability by giving up consistency guarantees, for instance, last writer wins; the other guarantees consistency by serializing all updates, which sacrifices scalability. But in some limited cases, a radical simplification is possible, that is if concurrent updates to some data commute, then no matter what order the replicas execute all updates, they will eventually converge. This paper call this a Commutative Replicated Data Type (CRDT). It ensures that there are no conflicts, hence, no need for consensus-based concurrency control. One thing to note that is, it's not a universal solution, but the paper claims that it's applicable to a surprisingly large amount of applications.

A simple example of CRDT is a set with a single add-element operation. A delete-element operation can be emulated by adding it to the set that contains all deleted element. Because now adding and deleting element operation is commutative, and idempotent. This data structure can easily scale out to many replicas.

As a more full fledged example. this paper presents TreeDoc, another example that allows consistency without concurrency control.

Although it's possible to design the underlying data structure in a way that we can do concurrent updates without sacrificing consistency. But the designing of these data structures usually requires much more effort than those more intuitive data structures. So I'm not sure about how much people will be willing to adopt this approach. But it certainly points out a new possibility for certain types of applications.