Home > Blockchain >  Batch upsert multiple rows with psycopg2
Batch upsert multiple rows with psycopg2

Time:11-23

I need to upsert (INSERT ... ON CONFLICT DO UPDATE) multiple rows at once into a postgreSQL database using psycopg2. Essentially, I have a list of tuples representing "rows", and I need to insert them into the database, or update the database if there is a conflict. I need (possibly) every column to be updated (if not inserted), along with every row.

I've tried two main approaches, using psycopg2's cursor.execute() function and execute_many() function. First, I did the following:

upsert_statement = 'INSERT INTO table (col1, col2, col3) VALUES %s ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = ROW (excluded.*) WHERE table IS DISTINCT FROM excluded'

psycopg2.extras.execute_values(cursor, upsert_statement, values)

I create an SQL statement that inserts the values using execute_many() (where values passed to it is a list of tuples), and on a conflict the column values should be updated to excluded. However, I get the error SyntaxError: number of columns does not match number of values sometimes, even though I know for a fact that the number of columns and values are the same.

So, I tried using only execute():

upsert_statement = f'INSERT INTO table (col1, col2, col3) VALUES (value1, value2, value3), (value4, value5, value6)... ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...'

cursor.execute(upsert_statement)

Here, I do the batch upsert as part of the SQL, and so don't have to use execute_values(). However, I get a SyntaxError after the DO UPDATE SET, because I don't think it's valid to have (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)....

What am I doing wrong? How can I bulk upsert multiple rows using psycopg2?

(I should note that in reality, (col1, col2, col3) and (value1, value2, value3) are dynamic, and change frequently)

CodePudding user response:

You need to use table EXCLUDED instead of value literals in your ON CONFLICT statement. It's a special table holding values proposed for insert. You also don't need to re-set the conflicting values, only the rest.

INSERT INTO table (col1, col2, col3) 
VALUES 
    (value1, value2, value3), 
    (value4, value5, value6)
ON CONFLICT (col1) DO UPDATE 
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3);

For readability, you can format your in-line SQLs if you triple-quote your f-strings. I'm not sure if and which IDEs can detect it's an in-line SQL in Python and switch syntax highlighting, but I find indentation helpful enough.

upsert_statement = f"""
    INSERT INTO table (col1, col2, col3) 
    VALUES 
        ({value1}, {value2}, {value3}), 
        ({value4}, {value5}, {value6})
    ON CONFLICT (col1) DO UPDATE 
    SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3)"""

Here's a simple test:

drop table if exists test_70066823 cascade;
create table test_70066823 (
    id integer primary key, 
    text_column_1 text, 
    text_column_2 text);
insert into test_70066823 select 1,'first','first';
insert into test_70066823 select 2,'second','second';
select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
--  1 | first         | first
--  2 | second        | second
--(2 rows)


insert into test_70066823
values
        (1, 'third','first'),
        (3, 'fourth','third'),
        (4, 'fifth','fourth'),
        (2, 'sixth','second')
on conflict (id) do update 
set text_column_1=EXCLUDED.text_column_1,
    text_column_2=EXCLUDED.text_column_2;

select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
--  1 | third         | first
--  3 | fourth        | third
--  4 | fifth         | fourth
--  2 | sixth         | second
--(4 rows)

You can refer to this for improved insert performance. Inserts with a simple string-based execute or execute_many are the top 2 slowest approaches mentioned there.

  • Related