I have the following concurrency use-case: An endpoint can be called at any time and an operation is supposed to happen. The operation goes like this in pseudocode (current isolation level is READ COMMITTED
):
SELECT * FROM TABLE_A WHERE IS_LATEST=true FOR UPDATE
// DO SOME APP LOGIC TO TEST VALIDITY
// ALL GOES WELL => INSERT OR UPDATE NEW ROW WITH IS_LATEST=TRUE => COMMIT
// OTHERWISE => ROLLBACK (all good not interesting)
Now this approach with SELECT FOR UPDATE
is fine if two of these operations start at the same time in the respects of update. Because both transactions see the same number of rows, one will update the rows and the second transaction will wait its turn before being able to SELECT FOR UPDATE
and the state is valid.
The issue I have is when I have an insert in the first transaction. What happens is that for example when the first transaction makes that lock SELECT FOR UPDATE
there are two rows, then the transaction continues, in the middle of the transaction, the second transaction comes in wanting to SELECT FOR UPDATE
(latest) and waits for first transaction to finish.. The first transaction finished and there is a new third item realistically in the db, but the second transaction picks up only two rows while it was waiting for the row locks to be released. (This is because at the time of calling the SELECT FOR UPDATE
the snapshot was different had only two rows that matched IS_LATEST=true
).
Is there a way to make this transaction such that the SELECT
lock picks up the latest snapshot after waiting?
CodePudding user response:
With your method, the query in the second transaction will return an empty result after the lock is gone, because it sees is_latest = FALSE
on the row in question, and the new row is not yet visible. So you would have to retry the transaction in that case.
I suggest that you use REPEATABLE READ
isolation level and optimistic locking instead:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM table_a WHERE is_latest; -- no locks!
/* perform your application ruminations */
UPDATE table_a SET is_latest = FALSE WHERE id = <id you found above>;
INSERT INTO table_a (is_latest, ...) VALUES (TRUE, ...);
COMMIT;
Then three things may happen:
Your query finds a row, and the transaction succeeds.
Your query finds no row, then you could insert the first row.
The query finds a row, but the update of that row causes a serialization error.
In that case you know that a concurrent transaction interfered, and you repeat the complete transaction in response.
CodePudding user response:
The issue is that each command only sees rows that have been committed before the query started. There are various possible solutions ...
Stricter isolation level
You can solve this with a stricter isolation level, but that's relatively expensive.
Laurenz already provided a solution for this.
Just start a new command
Keep the (cheap) default isolation level READ COMMITTED
, and just start a new command.
Only few rows to lock
While only locking a hand full of rows, the dead simple solution is to repeat the same SELECT ... FOR UPDATE
. The second iteration sees newly committed rows and locks them additionally.
There is a theoretical race condition with additional transactions that might lock new rows before the waiting transaction does. That would result in a deadlock. Highly unlikely, but to be absolutely sure, lock rows in consistent order:
BEGIN; -- default READ COMMITTED
SELECT FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- consistent order
SELECT * FROM table_a WHERE is_latest ORDER BY id FOR UPDATE; -- just repeat !!
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
ELSE
ROLLBACK;
END;
A partial index on (id) WHERE is_latest
would be ideal.
More rows to lock
For more than a hand full of rows, I would instead create a dedicated one-row token table. A bullet-proof implementation could look like this, run as admin or superuser:
CREATE TABLE public.single_task_x (just_me bool CHECK (just_me) PRIMARY KEY DEFAULT true);
INSERT INTO public.single_task_x VALUES (true);
REVOKE ALL ON public.single_task_x FROM public;
GRANT SELECT, UPDATE ON public.single_task_x TO public; -- or just to those who need it
See:
Then:
BEGIN; -- default READ COMMITTED
SELECT FROM public.single_task_x FOR UPDATE;
SELECT * FROM table_a WHERE is_latest; -- FOR UPDATE? ①
-- DO SOME APP LOGIC TO TEST VALIDITY
-- pseudo-code
IF all_good
ROLLBACK;
ELSE
UPDATE table_a SET is_latest = true WHERE ...;
INSERT table_a (IS_LATEST, ...) VALUES (true, ...);
COMMIT;
END;
A single lock is cheaper.
① You may or may not want to lock additionally, to defend against other writes, possibly with a weaker lock ....
Either way, all locks are released at the end of the transaction automatically.
Advisory lock
Or use an advisory lock. But then you have to unlock manually after you are done. Like:
BEGIN; -- default READ COMMITTED
SELECT pg_advisory_lock(123);
SELECT * FROM table_a WHERE is_latest;
...
SELECT pg_advisory_unlock(123); -- in any case, even if there was an error!!!
Just make sure to use a unique token for your particular task. 123
in my example.