I have a million-row table in Postgres 13 that needs a one-time update of each row: the (golang) script will read the current column value for each row, transform it, then update the row with the new value, for example:
DECLARE c1 CURSOR FOR SELECT v FROM users;
FETCH c1;
-- read and transform v
UPDATE users SET v = ? WHERE CURRENT OF c1;
-- transaction committed
FETCH c1;
...
I'm familiar with cursors for reading, but have a few requirements for writing that I'm struggling to find the right settings for:
I don't want it all to run in a single huge transaction, which is the default with cursors, since the change set will be large and it will take a while. I'd rather each update be its own transaction, and I can re-run the idempotent script again if it fails for any reason. I'm aware of
DECLARE WITH HOLD
to have the cursor span transactions, but...By default the data read by the cursor is "insensitive" (a snapshot from when the cursor was first created), but I would like the latest data for each row with FETCH in case there has been a subsequent update. The solution to that is to use
FOR UPDATE
in the cursor query to make it "sensitive," but that is not allowed together withWITH HOLD
. I would prefer the row lock you get withFOR UPDATE
to prevent the read-then-write race condition between FETCH and UPDATE, but it's not mandatory
How can I iterate all rows and update them one at a time without having to read everything into memory first?
CodePudding user response:
Make the cursor be WITH HOLD, but select the pk rather than v. Then in the loop, select the now-current v from the table based the pk (rather than current of main
), and update it using the pk.