I am trying to understand how latches work in databases. I am trying to build a concurrent btree with lock crabbing / coupling techniques. Lock coupling guarantees isolation of single latch operations (insert, deletes, and scans). But each SQL command may require multiple latches to be acquired. In between two latch operations for the same command, how is it guaranteed that there isn't another latch operation that is performed in between the the two btree operations from the first command?
CodePudding user response:
For the sake of this answer, I'm going to assume that you're using a lock-based concurrency control (LCC) protocol. This is a common protocol in databases, and it's the one that I'm most familiar with.
Lock Types
In LCC, there are two types of locks: shared locks and exclusive locks. A shared lock allows concurrent access to a resource, while an exclusive lock requires exclusive access to a resource. For example, if you have a shared lock on a database table, you can perform a SELECT
query on that table, but you can't perform an INSERT
query on that table. If you have an exclusive lock on a database table, you can perform an INSERT
query on that table, but you can't perform a SELECT
query on that table.
Transaction Types
In LCC, there are two types of transactions: read-only and read-write. A read-only transaction is one that only performs SELECT
queries. A read-write transaction is one that performs SELECT
queries and INSERT
, UPDATE
, or DELETE
queries. In LCC, read-only transactions are allowed to acquire shared locks on resources, while read-write transactions are allowed to acquire both shared and exclusive locks on resources.
Lock Acquisition
In LCC, a transaction is allowed to acquire a lock on a resource if and only if the transaction doesn't already have a lock on that resource. So, for example, if you have a read-only transaction that has a shared lock on a database table, you can't acquire a second shared lock on that table. Similarly, if you have a read-write transaction that has an exclusive lock on a database table, you can't acquire a second shared lock on that table.
Lock Release
In LCC, a transaction is allowed to release a lock on a resource if and only if the transaction doesn't have any other locks on that resource. So, for example, if you have a read-only transaction that has a shared lock on a database table, you can't release that shared lock on that table if you still have a shared lock on some other database table. Similarly, if you have a read-write transaction that has an exclusive lock on a database table, you can't release that exclusive lock on that table if you still have an exclusive lock on some other database table.
I hope that this helps.
CodePudding user response:
It sounds like you're mixing the roles of locks and latches. Locks are a "high level" database feature which are associated with transactions, and latches are a "low level" feature that only needs to provide thread-local mutual exclusion. What a database calls a latch, most concurrency APIs call a lock.
A concurrent b-tree performs latch coupling, not lock coupling. An SQL command might acquire multiple locks, and the locks are held for the duration of the transaction. Latches are released as soon as possible, and they don't need to be tracked by the transaction.
Locks guard logical records, like rows, tables, etc. Latches only guard b-tree nodes. When thinking about b-tree concurrency, just think in terms of latches and don't think about locks or transactions.
When an SQL command performs an operation like an insert, it might first acquire a lock to prevent concurrent modifications to the row. Exactly how it does it, and what kind of lock it acquires varies by database. Note that lock acquisition doesn't need to interact with the b-tree at all. Locks can be managed by a separate hashtable.
When the insert writes into the b-tree, it latches the root node, finds the child node, and then latches the child node. Next, the child node is latched, and the parent node latch is released. This is latch coupling. Once the target child node is found, the record is inserted into it, and the latch is released. If the node needs to split, then this creates special problems because latch coupling cannot go in reverse (child to parent) without risking deadlocks. The b-link tree design solves this problem.
As long as all b-tree operations (including reads) follow the proper root-to-child latch coupling strategy, multiple threads can safely interact with the b-tree. For improved concurrency, a mix of shared/exclusive latches is required. Read-only operations can use shared latches, but write operations generally require exclusive latches. Shared latches can be used for write operations too, but only when searching through the parent nodes. Splits and merges require special attention. Again, b-link trees solve this problem.