Simplified database consistency, Part 1: Obligatory CAP Theorem musings

Brewer’s Conjecture was formalized as the CAP theorem when this paper was published. It’s a good read. In this blog entry, I will talk about my philosophical take on the CAP theorem.

Let’s break down the three letters of the CAP Theorem.

  • Consistency – This represents the “correctness” of a database.
  • Availability – This represents the ability of a database to serve requests in a performant manner.
  • Partition Tolerance – This represents the ability of a database to operate with node or net failure.

The CAP theorem is commonly stated as “pick any two, you can’t have the third one.” In reality, there’s a tension between all points of the CAP triangle, and each edge in the triangle is a nebulous continuum. First, let’s break down what “consistency” really means.

The defining standard of a consistent database is ACID. I will restate it slightly differently for programmers who may not be database architects:

A database which follows ACID constructs will have the same state as a database that has a global read-write lock on the entire system, with the property that any unexpected write failure results in rollback of all changes before the write lock is released.

(This isn’t quite accurate, but is good enough for this blog.)

Obviously, we can’t have a global read-write lock in most scenarios: we’d be sacrificing too much availability. So, how do we achieve the appearance of a single global read-write lock while serving all requests fast enough? My contention is: you can’t. If it walks like a duck, if it quacks like a duck — then it’s a duck. The only way to achieve highest consistency levels is to have a global read-write lock or something just as inefficient. Therein lies the most obvious A-C tradeoff.

From there, the tension between A-C is a continuum which I would state as: “the more the system has to synchronize on, the more availability that is traded off.” NoSQL databases often describe themselves as fully “AP” or fully “CP”; in practice, “CP” and “AP” are meaningless descriptions, and it is intractable for a database to achieve either full availability or full consistency.

To simplistically demonstrate this continuum, consider these following assertions:

  • If a transaction relies on record-level consistency, then record locks are needed.
  • If a transaction relies on a range of rows being consistent, then multiple record locks are needed.
  • If a transaction relies on a table being consistent, then table locks are needed.
  • If a transaction relies on multiple tables being consistent, then multi-table locks are needed.

In addition to the above, if any of these applies over a distributed system, then there has to be some sort of global lock/synchronization/consensus dance which is hard to execute efficiently. Even the Paxos algorithm relies on the existence of some global monotonically increasing sequence to be perfect, which brings its own set of tradeoffs.

There are ways to push back on having actual write locks, usually at the expense of space: for instance, it’s possible to avoid non-repeatable reads without acquiring a table lock in a NSM format database (that is, append-only records; every update is a tombstone followed by an append of a new record) simply by tagging records with a timestamp, then note the current timestamp at the start of a read, and not reading any records with a timestamp above the current timestamp. But these pushbacks are effective only to a point; we don’t want to be creating a copy of a table to achieve table-level consistency, for instance.

What about the “P” component, partition tolerance? If one thinks of “P” as not just network partitions, but as network latency, out of order messages, and so on, then it becomes apparent that P cannot be code-sacrificed in a distributed system, simply because partitions are always ongoing to some degree or another in a distributed system. The only databases which can be honestly described as anywhere near “AC” are traditional RDBMSes.

One might be tempted to think of NoSQL databases such as Cassandra as pushing the end of the AP triangle. But there’s more to the rabbit hole. Column-oriented stores are the true extreme end of the AP triangle, breaking down record-level consistency to element-level consistency. More on this in Part II.

This entry was posted in Databases and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback.

Leave a Reply

Your email address will not be published. Required fields are marked *

Your email address will never be published.