Home > Enterprise >  Is there a way to ensure that inserts do not happen until after all data is collected for that row?
Is there a way to ensure that inserts do not happen until after all data is collected for that row?

Time:09-12

Brief

I'm getting inserts into a transactions log table out of desired (logical) order - where a deposit to a an accountinstance iteration occurs before that iteration exists, logically.

Technology

We're using PostgreSQL (13.x).

Background

I'm writing code that must be atomic (it is depositing or withdrawing funds from a logical account).

In addition to ensuring that the deposit or withdraw on a given row is atomic, I need to record the action(s) in a transactions table - one row per logical action.

My system is slightly complicated by the fact that some atomic actions produce two entries in the transactions table - a withdraw a deposit (a reset event). So money is both taken out and put in, and two rows need to be added to the transactions table - one to represent the money taken out, and a second to represent the money put in.

This system has plenty of ordinary withdraws or deposits that of course need to be atomic relative to the current balance on those accounts, and which themselves generate a single transaction row to record their action.

First: Even with intensive testing, it appears that the sql queries get the atomicity of the account correct. That is: all actions are correctly deposited or withdrawn in sequence and I have no cases of a dirty-read or overwrite of data. (yay).

However, the transactions table ends up with action-rows that refers to an iteration and balance that doesn't exist yet, according to the transaction table's log of events.

So I SHOULD get:

  • reset part 1 (withdraw)
  • reset part 2 (deposit)
  • withdraw 1 from iteration of account that has been reset

...and so on - with the reset-actions being always back-to back, and prior to any withdraws or deposits that refer to that new iteration.

Yet, what I get in actual practice sometimes looks like:

  • withdraw 1 from iteration of account that has been reset
  • reset part 1 (withdraw)
  • reset part 2 (deposit)

The account balances recorded in "withdraw 1" are correct for it being after the full reset event. So it records balances that could only happen after the "reset part 2" finished executing (and records the iteration = what that reset set it to). But this query generates the above inserts into transactions (in timestamp order) and I am more than a little flummoxed as to how to avoid this goofy interleaved result in my transactions table?

A deposit looks like this:

    WITH
        accounts AS (
            SELECT id, iteration, balance_tenthoucents, subbalance_tenthoucents
            FROM accountinstances
            WHERE id = ANY($1)
            FOR UPDATE
        ),
        deposits AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, subwagerid, amount_tenthoucents, subbalance_tenthoucents, balance_tenthoucents, subbalance_balance_tenthoucents) VALUES
                ($2, (SELECT iteration FROM accounts WHERE id = $2), $3, $4, $5, (SELECT balance_tenthoucents $4 FROM accounts WHERE id = $2), (SELECT subbalance_tenthoucents $5 FROM accounts WHERE id = $2))
            RETURNING accountinstanceid, balance_tenthoucents, subbalance_balance_tenthoucents
        )
    UPDATE accountinstances
    SET
        balance_tenthoucents = deposits.balance_tenthoucents,
        subbalance_tenthoucents = deposits.subbalance_balance_tenthoucents
    FROM deposits
    WHERE accountinstances.id = deposits.accountinstanceid
    RETURNING accountinstances.accountid, accountinstances.balance_tenthoucents/10000

And a reset is executed as:

    WITH
        locked AS (
            SELECT
                accountinstances.id, accountinstances.iteration, accountinstances.accountid, accountinstances.balance_tenthoucents, accountinstances.subbalance_tenthoucents,
                ((accountinstances.balance_tenthoucents/10000)*10000) AS win_tenthoucents,
                (accounts.reset_balance * 10000)   COALESCE(accountinstances.subbalance_tenthoucents, 0) AS reset_balance,
                (CASE WHEN accounts.subbalance_pertenthou > 0 THEN -accountinstances.subbalance_tenthoucents ELSE NULL END) AS subbalance
            FROM accountinstances
            JOIN accounts ON accountinstances.accountid = accounts.id
            WHERE accountinstances.id = ANY($1)
            FOR UPDATE OF accountinstances
        ),
        wins AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, amount_tenthoucents, payoutid, balance_tenthoucents, subbalance_balance_tenthoucents)
                SELECT locked.id, locked.iteration, -locked.win_tenthoucents, $2, locked.balance_tenthoucents-locked.win_tenthoucents, locked.subbalance_tenthoucents
                FROM locked
            RETURNING id AS txid, accountinstanceid AS id, -amount_tenthoucents/10000 AS won, created
        ),
        resets AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, amount_tenthoucents, subbalance_tenthoucents, balance_tenthoucents, subbalance_balance_tenthoucents)
                SELECT wins.id, locked.iteration   1, locked.reset_balance, locked.subbalance, locked.balance_tenthoucents-locked.win_tenthoucents locked.reset_balance, (CASE WHEN locked.subbalance IS NOT NULL THEN 0 ELSE NULL END)
                FROM locked
                JOIN wins ON wins.id = locked.id
            RETURNING accounttransactions.accountinstanceid AS id, accounttransactions.iteration, accounttransactions.balance_tenthoucents, accounttransactions.subbalance_balance_tenthoucents, accounttransactions.balance_tenthoucents/10000 AS balance
        ),
        updates AS (
            UPDATE accountinstances
            SET
                iteration = resets.iteration,
                balance_tenthoucents = resets.balance_tenthoucents,
                subbalance_tenthoucents = resets.subbalance_balance_tenthoucents
            FROM resets
            WHERE accountinstances.id = resets.id
            RETURNING accountinstances.id AS id, accountinstances.accountid AS accountid, accountinstances.balance_tenthoucents/1000 AS balance
        )
    SELECT wins.txid, updates.accountid, wins.won, updates.balance, wins.created FROM updates JOIN wins ON updates.id = wins.id

Some Example Data

8   2022-09-09 19:26:45.564463 00   4000000000  4000000000  -- tail end of previous reset - where the account was given initial funds
9   2022-09-09 19:26:45.570226 00   100000      4000100000  -- a deposit
8   2022-09-09 19:26:45.574191 00   -4000000000 0           -- part 1 of reset from 8->9
9   2022-09-09 19:26:45.574191 00   4000000000  4000000000  -- part 2 of reset from 8->9

To understand this -

  • column 1 is "iteration" -- it is an index that defaults to 1, and increments as part of the atomic reset event. So we go from iteration 8->9 ONLY during a reset event, when the account is both emptied and refilled.
  • column 2 is "created" - a simple timestamptz default = now()
  • column 3 is "delta" -- positive is a deposit, negative is a withdraw
  • column 4 is "balance" -- current value within the account instance

So the crazy thing to me is that, as of iteration 8, the account has a bal of 4000000000.

Then, according to my transactions log, a deposit of 100000 is made, bringing us to a balance of 4000100000.

OK - you're thinking - that makes perfect sense. But NO - that deposit is to iteration 9!! That means that the next reset event, taking us from iteration 8 to 9 HAS ALREADY HAPPENED!!!

And then we see that reset event - which is recorded as the next two items at 19:26:45.574191 00.

crazy

So I am forced to wrestle with:

  • Did pg initialize the INSERT for the deposit on row 2 at 19:26:45.570226 00 but then got stuck waiting for the reset event on rows 3,4 to finish before being able to finish the insertion, but somehow because it started sooner, it "reserved" that row (and created time) in the transactions log table and appears before the reset's two entries (otherwise, it's impossible for it to have seen iteration 9, and not 8, which is where it logically is in the time-stream)?!!
  • Another way to look a it - somehow the effects of the reset happens first (on the acccountinstances row) - but then the deposit action happens second but gets its transaction log in first, and then the transaction logs from the resets are inserted after.

Update

Okay - so I did some more testing - and if I do NOT order my resulting transaction rows - but allow them in the natural order that Postgres has internally (which I have to believe is the order that things actually happened within the table) - I get the order I want!

But - the timestamp on those errant deposits or withdraws are sometimes predating the accountinstance iteration that they belong to.

I think this has to mean that postgres is capturing the value of now() when it first queues the query - not when it actually commits the query. So it captures now() for created as if it were before the reset iteration that it will ultimately refer to.

I think it creates the rows like:

  • reset p1 - win (time=2)
  • reset p2 - seed deposit (time=2)
  • deposit to this iteration with a timestamp predating 1 (time=1)

I would need to default that column not to now() but to some postgres function that is lazy-evaluated - AT THE TIME OF ACTUAL COMMIT?!!

Hmm...

I wonder if I can do something dumb like use: created=max(now(),accounts.updated) (or whatever the correct syntax is) for the deposits / withdraws query? To force it to never put it "before the thing being modified" - and then obviously maintain a modified column in the accountinstances table...

Unless Posgres has a "timestamp of commit" function - or similarly lazily evaluated now_lazy() :P

Follow Up And Solution

So the issue was indeed that Postgres caches NOW() at the start of transaction - which the transaction can collide with another in the engine such that it has to wait for the resources it needs to unlock before proceeding - so it gets a NOW() that predates the existence of the stuff it sees when it actually is able to execute.

This is what I was seeing.

The solution which I've tested and appears to work nicely is clock_timestamp() instead of NOW().

I've implemented it so that it's explicitly being set in the queries, rather than changing the column default - not sure if setting the column default would work just as well?

Regardless, I now get my transaction rows to be in logical, chronological order, where the increment referred to always exists in the transaction stream before being referred to (and we never refer back before the current iteration either - everything is A, B, C... order).

Woo Hoo!

CodePudding user response:

Follow Up And Solution

So the issue was indeed that Postgres caches NOW() at the start of transaction - which the transaction can collide with another in the engine such that it has to wait for the resources it needs to unlock before proceeding - so it gets a NOW() that predates the existence of the stuff it sees when it actually is able to execute.

This is what I was seeing.

The solution which I've tested and appears to work nicely is clock_timestamp() instead of NOW().

I've implemented it so that it's explicitly being set in the queries, rather than changing the column default - not sure if setting the column default would work just as well?

Regardless, I now get my transaction rows to be in logical, chronological order, where the increment referred to always exists in the transaction stream before being referred to (and we never refer back before the current iteration either - everything is A, B, C... order).

Fixed deposit with explicit created using clock time:

    WITH
        accounts AS (
            SELECT id, iteration, balance_tenthoucents, subbalance_tenthoucents
            FROM accountinstances
            WHERE id = ANY($1)
            FOR UPDATE
        ),
        deposits AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, created, subwagerid, amount_tenthoucents, subbalance_tenthoucents, balance_tenthoucents, subbalance_balance_tenthoucents) VALUES
                ($2, (SELECT iteration FROM accounts WHERE id = $2), clock_timestamp(), $3, $4, $5, (SELECT balance_tenthoucents $4 FROM accounts WHERE id = $2), (SELECT subbalance_tenthoucents $5 FROM accounts WHERE id = $2))
            RETURNING accountinstanceid, balance_tenthoucents, subbalance_balance_tenthoucents
        )
    UPDATE accountinstances
    SET
        balance_tenthoucents = deposits.balance_tenthoucents,
        subbalance_tenthoucents = deposits.subbalance_balance_tenthoucents
    FROM deposits
    WHERE accountinstances.id = deposits.accountinstanceid
    RETURNING accountinstances.accountid, accountinstances.balance_tenthoucents/10000

And a reset is executed as:

    WITH
        locked AS (
            SELECT
                accountinstances.id, accountinstances.iteration, accountinstances.accountid, accountinstances.balance_tenthoucents, accountinstances.subbalance_tenthoucents,
                ((accountinstances.balance_tenthoucents/10000)*10000) AS win_tenthoucents,
                (accounts.reset_balance * 10000)   COALESCE(accountinstances.subbalance_tenthoucents, 0) AS reset_balance,
                (CASE WHEN accounts.subbalance_pertenthou > 0 THEN -accountinstances.subbalance_tenthoucents ELSE NULL END) AS subbalance
            FROM accountinstances
            JOIN accounts ON accountinstances.accountid = accounts.id
            WHERE accountinstances.id = ANY($1)
            FOR UPDATE OF accountinstances
        ),
        wins AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, created, amount_tenthoucents, payoutid, balance_tenthoucents, subbalance_balance_tenthoucents)
                SELECT locked.id, locked.iteration, clock_timestamp(), -locked.win_tenthoucents, $2, locked.balance_tenthoucents-locked.win_tenthoucents, locked.subbalance_tenthoucents
                FROM locked
            RETURNING id AS txid, accountinstanceid AS id, -amount_tenthoucents/10000 AS won, created
        ),
        resets AS (
            INSERT INTO accounttransactions (accountinstanceid, iteration, created, amount_tenthoucents, subbalance_tenthoucents, balance_tenthoucents, subbalance_balance_tenthoucents)
                SELECT wins.id, locked.iteration   1, clock_timestamp(), locked.reset_balance, locked.subbalance, locked.balance_tenthoucents-locked.win_tenthoucents locked.reset_balance, (CASE WHEN locked.subbalance IS NOT NULL THEN 0 ELSE NULL END)
                FROM locked
                JOIN wins ON wins.id = locked.id
            RETURNING accounttransactions.accountinstanceid AS id, accounttransactions.iteration, accounttransactions.balance_tenthoucents, accounttransactions.subbalance_balance_tenthoucents, accounttransactions.balance_tenthoucents/10000 AS balance
        ),
        updates AS (
            UPDATE accountinstances
            SET
                iteration = resets.iteration,
                balance_tenthoucents = resets.balance_tenthoucents,
                subbalance_tenthoucents = resets.subbalance_balance_tenthoucents
            FROM resets
            WHERE accountinstances.id = resets.id
            RETURNING accountinstances.id AS id, accountinstances.accountid AS accountid, accountinstances.balance_tenthoucents/1000 AS balance
        )
    SELECT wins.txid, updates.accountid, wins.won, updates.balance, wins.created FROM updates JOIN wins ON updates.id = wins.id

Thank You!

Each question & response to my question pushed me to dig further - until the answer became possible.

MUCH Appreciated!

CodePudding user response:

I think you're making an assumption about the ordering of rows in a table. Tables are sets of rows, and have no inherent order. A table is just a bag of rows. The fact that they sometimes seem to have an inherent order can be very confusing.

If you want to process rows in the order in which they were inserted into the table, you need to put a SERIAL or BIGSERIAL column into the table. Then, when processing rows you need to use an ORDER BY clause in your queries.

  • Related