Batch upsert multiple rows with psycopg2-CodePudding

I need to upsert (INSERT ... ON CONFLICT DO UPDATE) multiple rows at once into a postgreSQL database using psycopg2. Essentially, I have a list of tuples representing "rows", and I need to insert them into the database, or update the database if there is a conflict. I need (possibly) every column to be updated (if not inserted), along with every row.

I've tried two main approaches, using psycopg2's cursor.execute() function and execute_many() function. First, I did the following:

upsert_statement = 'INSERT INTO table (col1, col2, col3) VALUES %s ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = ROW (excluded.*) WHERE table IS DISTINCT FROM excluded'

psycopg2.extras.execute_values(cursor, upsert_statement, values)

I create an SQL statement that inserts the values using execute_many() (where values passed to it is a list of tuples), and on a conflict the column values should be updated to excluded. However, I get the error SyntaxError: number of columns does not match number of values sometimes, even though I know for a fact that the number of columns and values are the same.

So, I tried using only execute():

upsert_statement = f'INSERT INTO table (col1, col2, col3) VALUES (value1, value2, value3), (value4, value5, value6)... ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...'

cursor.execute(upsert_statement)

Here, I do the batch upsert as part of the SQL, and so don't have to use execute_values(). However, I get a SyntaxError after the DO UPDATE SET, because I don't think it's valid to have (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)....

What am I doing wrong? How can I bulk upsert multiple rows using psycopg2?

(I should note that in reality, (col1, col2, col3) and (value1, value2, value3) are dynamic, and change frequently)

CodePudding user response：

You need to use table EXCLUDED instead of value literals in your ON CONFLICT statement. It's a special table holding values proposed for insert. You also don't need to re-set the conflicting values, only the rest.

INSERT INTO table (col1, col2, col3) 
VALUES 
    (value1, value2, value3), 
    (value4, value5, value6)
ON CONFLICT (col1) DO UPDATE 
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3);

For readability, you can format your in-line SQLs if you triple-quote your f-strings. I'm not sure if and which IDEs can detect it's an in-line SQL in Python and switch syntax highlighting, but I find indentation helpful enough.

upsert_statement = f"""
    INSERT INTO table (col1, col2, col3) 
    VALUES 
        ({value1}, {value2}, {value3}), 
        ({value4}, {value5}, {value6})
    ON CONFLICT (col1) DO UPDATE 
    SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3)"""

Here's a simple test:

drop table if exists test_70066823 cascade;
create table test_70066823 (
    id integer primary key, 
    text_column_1 text, 
    text_column_2 text);
insert into test_70066823 select 1,'first','first';
insert into test_70066823 select 2,'second','second';
select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
--  1 | first         | first
--  2 | second        | second
--(2 rows)


insert into test_70066823
values
        (1, 'third','first'),
        (3, 'fourth','third'),
        (4, 'fifth','fourth'),
        (2, 'sixth','second')
on conflict (id) do update 
set text_column_1=EXCLUDED.text_column_1,
    text_column_2=EXCLUDED.text_column_2;

select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
--  1 | third         | first
--  3 | fourth        | third
--  4 | fifth         | fourth
--  2 | sixth         | second
--(4 rows)

You can refer to this for improved insert performance. Inserts with a simple string-based execute or execute_many are the top 2 slowest approaches mentioned there.