Simplified database consistency, Part 0: Not-so-simplified reference tables

A proper Database Theory 101 class would begin with the CAP Theorem – that it is impossible to have consistency, availability, and partition tolerance in the same database. This is a first in a blog series explaining the CAP theorem and all its nuances, and how it applies to distributed databases.

Part 0 starts out with reference material with minimal explanation, because I spent a lot of time putting together definitions from various sources to one place, and I’d like it in the cloud instead of my aging stack of filled notebooks.

The real Part I will go back to the basics: the CAP theorem and how it has been proven.

Database anomalies (sometimes called “phenomena”), which I sourced from various points:

Anomaly Name Description
Dirty Read Transaction T1 can read uncommitted data from a different transaction T2.
Non-Repeatable Read / Fuzzy Read Two reads within the same transaction T1 are not guaranteed to contain the same data within rows due to transaction T2 having been executed a write to these rows in between.
Phantom Read Two queries within the same transaction T1 return different rows due to transaction T2 having inserted or deleted rows in between.
Read Skew T1 reads a row x. T2 then modifies rows x and y, and commits. T1 then reads modified y, which may have been dependent on a row x that had not been committed at a time. (Classic example: calculating bank balances)
Dirty Write Two transactions change the same record at the same time.
Lost Updates Two transactions update the same row. The transaction that did the update first is now lost and was never seen by the other transaction.
Write Skew Two transactions concurrently read an overlapping data set, make disjoint updates, and then commit without seeing updates. For example: A bank has a rule that the combined balance of an user’s accounts must be 0 or greater. An user with 2 accounts has a balance of $100 each for a total of $200. With write skew, the user can initiate simultaneous withdrawals of $200 from each account, withdrawing $400 total, which is not detected because each individual transaction thinks the net balance is $200.
Real-Time Violation Transaction T1 sees transaction t2, but not necessarily all of the transactions that have occurred before t2. Applicable to a replicated distributed database.
Forward Freshness Applicable to replicated distributed databases, T1 is started before T2. T2 is considered committed by the database, and T1 is allowed to read data from T2, meaning that T1 does not actually read off a “snapshot” at the time it started. Opacity is lost.

Table of isolation levels defined by ANSI/ISO SQL:

Isolation Level Locks Acquired Disallowed Phenomenons
READ UNCOMMITTED Write lock on record only for duration of operation N/A
READ COMMITTED Write lock on record for duration of transaction Dirty reads
REPEATABLE READ Both read and write locks acquired on affected rows for duration of transaction Non-repeatable reads
SERIALIZABLE Range locks for duration of transaction Phantom reads

 

The next two tables are mostly from this paper which was quite influential in my old company. Extended consistency levels in distributed databases:

Acronym Name Description Comments
SSER Strict Serializability Transactions must be be equal to one particular serial execution Effectively a global lock that allows only one transaction at a time.
1CS One-Copy Serializability Transactions in a replicated database must execute in the same order Antonym to fresh-forwardness?
SER Serializability A set of transactions must be equal to some serial execution Equivalent to SERIALIZABLE SQL
EUS Extended Update Serializability 1CS applied to transactions Not sure how different than 1CS?
US Update Serializability 1CS applied only to committed update transactions
SI Snapshot isolation All reads made in a transaction will see a consistent snapshot of the database at the time of the read
PSI Parallel snapshot isolation Allows non-monotonic snapshots to be read by replicas Has what is called “base freshness” — transaction T2 committed after transaction T1 has started cannot be seen by T1.
NMSI Non-monotonic snapshot isolation PSI, but with fresh-forwardness allowed This is the alleged maximal transactional level that is fully scalable (although others can be scalable except in rare-ish scenarios)
RC Read committed What most (all production?) NoSQL databases aspire to achieve.

Putting the anomalies and serialization levels together:

SSER 1CS EUS SER US SI PSI NMSI RC
Dirty Read X X X X X X X X X
Dirty Writes X X X X X X X X X
Non-Repeatable Read X X X X X X X X
Read Skew X X X X X X X X
Lost Updates X X X X X X X X
Write Skew X X X X X
Real Time Violation X X X X
Fresh Forwardness X X X X X
This entry was posted in Databases and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback.

Leave a Reply

Your email address will not be published. Required fields are marked *

Your email address will never be published.