Home > Net >  Read from database in chunks with retries
Read from database in chunks with retries

Time:04-01

I am reading from a Microsoft SQL server instance. I need to read all data from a table, which is quite big (~4 million records). So I like to do that in chunks, so I can limit the memory usage of my Python program.

This works fine normally, but now I need to move where this runs, which forces it go over a not super stable connection (I believe VPN is sometimes throttling the connection). So occasionally I get a connection error in one of the chunks:

sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (
SQLGetData)')

The code I run comes down to this:

import pandas as pd
from sqlalchemy import create_engine

connection_string = 'mssql pyodbc://DB_USER:DB_PASSWORD@DB_HOST/DB_NAME?trusted_connection=no&driver=ODBC Driver 17 for SQL Server'
db = create_engine(connection_string, pool_pre_ping=True)

query = 'SELECT * FROM table'

for chunk in pd.read_sql_query(query, db, chunksize=500_000):
    # do stuff with chunk

What I would like to know: is it possible to add a retry mechanism that can continue with the correct chunk if the connection fails? I've tried a few options, but none of them seem to be able to recover and continue at the same chunk.

CodePudding user response:

query = 'SELECT * FROM table' is a bad practice always filter by the fields you need and process in chunks of 500 records https://www.w3schools.com/sql/sql_top.asp SELECT TOP number|percent column_name(s) FROM table_name WHERE condition;

CodePudding user response:

I feel your pain. My VPN is the same. I'm not sure if this is a viable solution for you, but you can try this technique.

retry_flag = True
retry_count = 0
cursor = cnxn.cursor()
while retry_flag and retry_count < 5:
    try:
        cursor.execute('SELECT too_id FROM [TTMM].[dbo].[Machines] WHERE MachineID = {}'.format (machineid,))
        too_id = cursor.fetchone()[0]
        cursor.execute('INSERT INTO [TTMM].[dbo].[{}](counter, effectively, too_id) VALUES ({},{},{})'.format (machineid, counter, effectively, too_id,))
        retry_flag = False
        print("Printed To DB - Counter = ", counter, ", Effectively = ", effectively, ", too_id = ", too_id,)

    except Exception as e:
        print (e)
        print ("Retry after 5 sec")
        retry_count = retry_count   1
        cursor.close()
        cnxn.close()
        time.sleep(5)
        cnxn = pyodbc.connect('DRIVER=FreeTDS;SERVER=*;PORT=*;DATABASE=*;UID=*;PWD=*;TDS_Version=8.7;', autocommit=True)
        cursor = cnxn.cursor()

cursor.close()

How to retry after sql connection failed in python?

  • Related