I have a financial system where users have tokens and can add transactions. The system has to calculate the balance and mean acquisition price of each token. Data integrity is of utmost importance in the system and it should be impossible to have incorrect balances or mean prices in the system.
To comply with these requirements I've come up with the following tables:
- token (to hold each token)
- transaction (to hold each transaction of a token)
- balance (to hold the token balances without having to calculate each time using all transactions)
The token and transaction tables are straight forward. The balance table is a table that is automatically updated using a PostgreSQL trigger to hold each change of balance in a token. This table exists so every time we need to know something like "What was the balance/mean price of token A in 2023-01-05?" we don't need to sum all transactions and calculate from scratch.
Trigger
Enough of explanation, this is the trigger I've come up with. It fires AFTER every INSERT in the transaction table.
DECLARE
old_balance NUMERIC(17, 8);
old_mean_price NUMERIC(17, 8);
old_local_mean_price NUMERIC(17, 8);
new_balance NUMERIC(17, 8);
new_mean_price NUMERIC(17, 8);
new_local_mean_price NUMERIC(17, 8);
BEGIN
-- Prevent the creation of retroactive transaction since it would mess up the balance table
IF EXISTS (
SELECT * FROM transaction
WHERE
token_id = NEW.token_id
AND date > NEW.date
) THEN
RAISE EXCEPTION 'There is already a newer transaction for token %', NEW.token_id;
END IF;
-- Fetch the latest balance of this token
SELECT
amount,
mean_price,
local_mean_price
INTO
old_balance, old_mean_price, old_local_mean_price
FROM balance
WHERE
token_id = NEW.token_id
AND date <= NEW.date
ORDER BY date DESC
LIMIT 1;
-- If there's no balance in the table then set everything to zero
old_balance := COALESCE(old_balance, 0);
old_mean_price := COALESCE(old_mean_price, 0);
old_local_mean_price := COALESCE(old_local_mean_price, 0);
-- Calculate the new values
IF NEW.side = 'buy' THEN
new_balance := old_balance NEW.quantity;
new_mean_price := (old_balance * old_mean_price NEW.quantity * NEW.unit_price) / new_balance;
new_local_mean_price := (old_balance * old_local_mean_price NEW.quantity * NEW.local_unit_price) / new_balance;
ELSIF NEW.side = 'sell' THEN
new_balance := old_balance - NEW.quantity;
new_mean_price := old_mean_price;
new_local_mean_price := old_local_mean_price;
ELSE
RAISE EXCEPTION 'Side is invalid %', NEW.side;
END IF;
-- Update the balance table
IF NOT EXISTS (
SELECT * FROM balance
WHERE
date = NEW.date
AND token_id = NEW.token_id
) THEN
-- Create a row in the balance table
INSERT INTO balance
(date, token_id, amount, mean_price, local_mean_price)
VALUES
(
NEW.date,
NEW.token_id,
new_balance,
new_mean_price,
new_local_mean_price
);
ELSE
-- There's already a row for this token and date in the balance table. We should update it.
UPDATE balance
SET
amount = new_balance,
mean_price = new_mean_price,
local_mean_price = new_local_mean_price
WHERE
date = NEW.date
AND token_id = NEW.token_id;
END IF;
RETURN NULL;
END;
This trigger does some things:
- Prevents the insertion of retroactive transactions, since this means we would have to update all the following balances
- Add a new row in the balance table with the updated balance and mean prices of the token
- Or, update the row in the balance if one already exists with the same datetime
Race condition
This works fine, but it has a race condition when executing 2 concurrent transactions. Imagine the following scenario:
- Start T1 using BEGIN
- Start T2 using BEGIN
- T1 inserts a row in the transaction table
- The trigger is fired inside T1 and it inserts a row in balance
- T2 inserts a row in the transaction table
- The trigger is fired inside T2 but it cannot see the changes made from the T1 trigger since it has not commited yet
- The balance created by T2 is incorrect because it used stale data
Imperfect solution 1
Maybe I could change the SELECT statement in the trigger (the one that selects the previous balance) to use a SELECT FOR UPDATE. This way the trigger is locked until a concurrent trigger is commited. This doesn't work because of three things:
- If it's the first transaction then the table balance doesn't have a row for that particular token (this could be solved by locking the token table)
- Even if we lock and wait the concurrent transaction to commit, because of the way transaction works in PostgreSQL we would still fetch stale data since inside a transaction we only have visibility of the data that was there when the transaction started.
- Even if we managed to get the most up to date information, there's still the issue that T1 can rollback and this means that the balance generated in T2 would still be incorrect
Imperfect solution 2
Another solution would be to scrap the FOR UPDATE and just defer the trigger execution to the transaction commit. This solves the race condition since the trigger is executed after the end of the transaction and has visibility of the most recent changed. The only issue is that it leaves me unable to use the balance table inside the transaction (since it will only be updated after the transaction commits)
Question
I have two questions regarding this:
- Does the Imperfect solution 2 really solves all the race condition problems or am I missing something?
- Is there a way to solve this problem and also update the balance table ASAP?
CodePudding user response:
Your solution 2 only narrows the race condition, but does not fix it. Both transactions could commit at the same time.
There are only two ways to prevent such a race condition:
use the
SERIALIZABLE
transaction isolation level (you can set that as default value with the parameterdefault_transaction_isolation
)lock whatever is necessary to prevent concurrent operations (for example, the corresponding row in
balance
)
Besides, your code can be improved: You should check for the existence of a balance
only once, and you could use INSERT ... ON CONFLICT
.
You could read my article for a more detailed analysis.
CodePudding user response:
You could also consider either creating an extra table containing the running transactions and trowing an error if an insert into that is not possible or simply locking the relevant balance rows and forcing the transactions to run fully sequentially that way. Either way you force conflicting statements to run one at a time thus resolving the race condition.