I am reading from a Microsoft SQL server instance. I need to read all data from a table, which is quite big (~4 million records). So I like to do that in chunks, so I can limit the memory usage of my Python program.
This works fine normally, but now I need to move where this runs, which forces it go over a not super stable connection (I believe VPN is sometimes throttling the connection). So occasionally I get a connection error in one of the chunks:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (
SQLGetData)')
The code I run comes down to this:
import pandas as pd
from sqlalchemy import create_engine
connection_string = 'mssql pyodbc://DB_USER:DB_PASSWORD@DB_HOST/DB_NAME?trusted_connection=no&driver=ODBC Driver 17 for SQL Server'
db = create_engine(connection_string, pool_pre_ping=True)
query = 'SELECT * FROM table'
for chunk in pd.read_sql_query(query, db, chunksize=500_000):
# do stuff with chunk
What I would like to know: is it possible to add a retry mechanism that can continue with the correct chunk if the connection fails? I've tried a few options, but none of them seem to be able to recover and continue at the same chunk.
CodePudding user response:
query = 'SELECT * FROM table' is a bad practice always filter by the fields you need and process in chunks of 500 records https://www.w3schools.com/sql/sql_top.asp SELECT TOP number|percent column_name(s) FROM table_name WHERE condition;
CodePudding user response:
I feel your pain. My VPN is the same. I'm not sure if this is a viable solution for you, but you can try this technique.
retry_flag = True
retry_count = 0
cursor = cnxn.cursor()
while retry_flag and retry_count < 5:
try:
cursor.execute('SELECT too_id FROM [TTMM].[dbo].[Machines] WHERE MachineID = {}'.format (machineid,))
too_id = cursor.fetchone()[0]
cursor.execute('INSERT INTO [TTMM].[dbo].[{}](counter, effectively, too_id) VALUES ({},{},{})'.format (machineid, counter, effectively, too_id,))
retry_flag = False
print("Printed To DB - Counter = ", counter, ", Effectively = ", effectively, ", too_id = ", too_id,)
except Exception as e:
print (e)
print ("Retry after 5 sec")
retry_count = retry_count 1
cursor.close()
cnxn.close()
time.sleep(5)
cnxn = pyodbc.connect('DRIVER=FreeTDS;SERVER=*;PORT=*;DATABASE=*;UID=*;PWD=*;TDS_Version=8.7;', autocommit=True)
cursor = cnxn.cursor()
cursor.close()