Monday, March 14, 2011

The CAP Theorem Event Horizon

The CAP Theorem has become a convenient excuse for throwing data consistency under the bus. It is automatically assumed that every distributed system falls prey to CAP and therefore must sacrifice one of the three objectives, with consistency being the consistent fall guy. This automatic assumption is simply false. I am not debating the validity of the CAP Theorem, but instead positing that the onset of CAP limitations—what I call the CAP event horizon—does not start as soon as you move to a second master database node. Certain approaches can, in fact, extend the CAP event horizon.

Physics tells us that different properties apply at different scales. For example, quantum physics displays properties that do not apply at larger scale. We see similar nuances in scaling databases. For example, if you are running a master slave database, using synchronous replication with a single slave is no problem. Add nine more slaves and it slows the system. Add another ninety slaves and you have a real problem with synchronous replication. In other words, consistency at small scale is no problem, but at large scale it becomes impossible, because your latency goes parabolic.
If you break the database into master (read/write) and slave (read-only) functions. You can operate a handful of slaves using synchronous replication without crossing the CAP event horizon, at least among the slaves. However, the master does present a SPOF (single point of failure), undermining availability.

Using a shared-nothing architecture, as soon as you introduce more than a single master, you hit the CAP event horizon. However, shared-disk / shared-cache systems like Oracle RAC and ScaleDB extend the CAP event horizon. They don’t invalidate the CAP event horizon, they merely extend it, by addressing the CAP issues while maintaining low latency.

Shared-disk databases enable multiple nodes to share the same data. Internal processes ensure that the data remains consistent across all nodes, while providing availability and partition tolerance. These processes entail a certain overhead from inter-nodal messaging. There are techniques that can be applied to dramatically reduce this inter-nodal messaging, resulting in a system that delivers the advantages of shared-disk, while delivering a performance profile rivaling shared-nothing, but that will have to wait for a later post.

While shared-nothing databases cross the CAP event horizon as soon as you add a second master, shared-disk databases extend this event horizon well into a handful of database nodes. Optimizations can further extend this eventuality to address dozens of database nodes (all masters). As you move to web scale applications, you will certainly cross the CAP event horizon, but most OLTP type applications can operate quite effectively on ten or fewer database servers, and in that case there is no need to throw consistency under the bus, solely in the name of the CAP Theorem.

More details on the CAP Theorem here, here and here.