Home > Software design >  How are serializable isolation violations detected?
How are serializable isolation violations detected?

Time:05-31

Does anyone know how SQL databases detect serializable isolation violations (SIV's)? It seems like simply brute forcing every permutation of transaction executions to find a match for the concurrent execution results to verify serializability wouldn't scale.

According to this paper from a third party researcher: https://amazonredshiftresearchproject.org/white_papers/downloads/multi_version_concurrency_control_and_serialization_isolation_failure.pdf

SIV's occur when two transactions are occurring at the same time and the more recent one commits some deleted rows that the less recent transaction later tries to delete as well. This is a situation that MVCC is unable to deal with and thus has to abort with SIV.

This makes sense for detecting SIV's involving queries that delete rows in MVCC, but I don't understand how SIV's are detected when only select and insert queries are used. For example, this example in AWS docs: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-serializable-isolation/

Does anyone have any idea?

CodePudding user response:

PostgreSQL detects serialization violations using a heuristics. Reading data causes predicate locks (SIReadLock) to be taken, and it checks for dangerous structures, which necessarily occur in every serialization violation. That means that you can get false positive serialization errors, but never false negatives.

This is all described in the documentation and in the scientific paper referenced there, and we can hope that Amazon didn't hack up PostgreSQL too badly in that area.

CodePudding user response:

Let me simplify things down as a lot of what is going on is complicated and it is easy to miss the forest for the trees.

  1. 2 transaction are in flight (BEGIN) and both are using their own database state that matches the database state at the time the BEGIN occurred.
  2. Each transaction modifies a table that is part of the other transaction's initial state.

That's it. Redshift doesn't "know" that the changes that the other transaction is material to the results this transaction is making. Just that it COULD be material. Since it COULD be material then the serialization hazard exists and one transaction is aborted to prevent the possibility of indeterminant results.

There's a lot of complexity and nuance to this topic that only is important if you are trying to understand why certain cases, timings, and SQL worked and others didn't. This gets into predicate locking which is how Redshift "knows" if some change being made somewhere else is effecting a part of the initial state that is material to this transaction. I.E. a bunch of bookkeeping. This is why the "select * from tab1" matters in the linked knowledge-center article - it creates the "predicate lock" for this transaction.

  • Related