I need to upsert (INSERT ... ON CONFLICT DO UPDATE
) multiple rows at once into a postgreSQL database using psycopg2. Essentially, I have a list of tuples representing "rows", and I need to insert them into the database, or update the database if there is a conflict. I need (possibly) every column to be updated (if not inserted), along with every row.
I've tried two main approaches, using psycopg2's cursor.execute()
function and execute_many()
function. First, I did the following:
upsert_statement = 'INSERT INTO table (col1, col2, col3) VALUES %s ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = ROW (excluded.*) WHERE table IS DISTINCT FROM excluded'
psycopg2.extras.execute_values(cursor, upsert_statement, values)
I create an SQL statement that inserts the values using execute_many()
(where values
passed to it is a list of tuples), and on a conflict the column values should be updated to excluded. However, I get the error SyntaxError: number of columns does not match number of values
sometimes, even though I know for a fact that the number of columns and values are the same.
So, I tried using only execute()
:
upsert_statement = f'INSERT INTO table (col1, col2, col3) VALUES (value1, value2, value3), (value4, value5, value6)... ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...'
cursor.execute(upsert_statement)
Here, I do the batch upsert as part of the SQL, and so don't have to use execute_values()
. However, I get a SyntaxError
after the DO UPDATE SET
, because I don't think it's valid to have (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...
.
What am I doing wrong? How can I bulk upsert multiple rows using psycopg2?
(I should note that in reality, (col1, col2, col3)
and (value1, value2, value3)
are dynamic, and change frequently)
CodePudding user response:
You need to use table EXCLUDED
instead of value literals in your ON CONFLICT
statement. It's a special table holding values proposed for insert. You also don't need to re-set the conflicting values, only the rest.
INSERT INTO table (col1, col2, col3)
VALUES
(value1, value2, value3),
(value4, value5, value6)
ON CONFLICT (col1) DO UPDATE
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3);
For readability, you can format your in-line SQLs if you triple-quote your f-strings. I'm not sure if and which IDEs can detect it's an in-line SQL in Python and switch syntax highlighting, but I find indentation helpful enough.
upsert_statement = f"""
INSERT INTO table (col1, col2, col3)
VALUES
({value1}, {value2}, {value3}),
({value4}, {value5}, {value6})
ON CONFLICT (col1) DO UPDATE
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3)"""
Here's a simple test:
drop table if exists test_70066823 cascade;
create table test_70066823 (
id integer primary key,
text_column_1 text,
text_column_2 text);
insert into test_70066823 select 1,'first','first';
insert into test_70066823 select 2,'second','second';
select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
-- 1 | first | first
-- 2 | second | second
--(2 rows)
insert into test_70066823
values
(1, 'third','first'),
(3, 'fourth','third'),
(4, 'fifth','fourth'),
(2, 'sixth','second')
on conflict (id) do update
set text_column_1=EXCLUDED.text_column_1,
text_column_2=EXCLUDED.text_column_2;
select * from test_70066823;
-- id | text_column_1 | text_column_2
------ --------------- ---------------
-- 1 | third | first
-- 3 | fourth | third
-- 4 | fifth | fourth
-- 2 | sixth | second
--(4 rows)
You can refer to this for improved insert performance. Inserts with a simple string-based execute
or execute_many
are the top 2 slowest approaches mentioned there.