Home > Back-end >  Use psycopg2 executemany to improve the query
Use psycopg2 executemany to improve the query

Time:10-12

I have the below query using execute to add data using execute, what would be a executemany version of this code?

import psycopg2
import os
from dotenv import load_dotenv
load_dotenv()


urlstocopy = return_list_urls2(1300000)



def main():
# fake values

    conn_string = "host='db-etherator-do-user--0.b.db.oean.com' port='60' dbname='etherr2' user='in' password='sadf'"
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    # convert them into list of tuples
    x = 1
    for link in urlstocopy:
        values = (x, '', '', '', '', link, '2022-10-10', '2022-10-10')
        sql = f'''insert into accounts_website(id, business_name, industry, trade, acc_notes,website,created_on,updated) values{str(values)}'''
        cursor.execute(sql)
        conn.commit()
        print(x)
        x = x 1
    conn.close()
if __name__ == "__main__":
    main()

CodePudding user response:

Using the Fast execution helpers

create table so_test(id integer, varchar_fld varchar, bool_fld boolean);

import psycopg2
from psycopg2.extras import execute_batch, execute_values

con = psycopg2.connect("dbname=test user=postgres host=localhost port=5432")
cur = con.cursor()


rows = [[1, 'test', 't'], [2, 'dog', 'f'], [3, 'cat', 't']]
execute_batch(cur, 'insert into so_test(id, varchar_fld, bool_fld)  values (%s, %s, %s)', rows)

rows = [[4, 'test', 't'], [5, 'dog', 'f'], [6, 'cat', 't']]
execute_values(cur, 'insert into so_test(id, varchar_fld, bool_fld)  values %s', rows)

con.commit()

select * from so_test ;
 id | varchar_fld | bool_fld 
---- ------------- ----------
  1 | test        | t
  2 | dog         | f
  3 | cat         | t
  4 | test        | t
  5 | dog         | f
  6 | cat         | t

From the fast execution helpers link:

The current implementation of executemany() is (using an extremely charitable understatement) not particularly performing. These functions can be used to speed up the repeated execution of a statement against a set of parameters. By reducing the number of server roundtrips the performance can be orders of magnitude better than using executemany().

Where "These functions ..." are execute_batch and execute_values. If you are going to deal with datasets of any size they are the way to go.

CodePudding user response:

Your code already has a problem, because you're not handling quoting. If there are quote characters in your URLs, your query will explode. You need this:

    rows = []
    for x,link in enumerate(urlstocopy):
        rows.append( (x 1, '', '', '', '', link, '2022-10-10', '2022-10-10') )

    sql = 'insert into accounts_website(id, business_name, industry, trade, acc_notes,website,created_on,updated) values (%s,%s,%s,%s,%s,%s,%s,%s);'
    cursor.executemany(sql, rows)
    conn.commit()
    conn.close()
  • Related