I have seen a few posts about this already but have not been able to find a straightforward answer.
I have a fairly basic loop. It runs some SQL for each table name in a list and sends that output to a csv file. With a few thousand tables in the database, there are a few that are just massive, and the query takes forever. In the interest of getting on with life (and since this data isn't super important, I would like my loop to skip an iteration if the time takes longer than a minute.
Here is my loop:
for t in tablelist:
df = pd.read_sql(sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn)
df.to_csv(path, index=None)
CodePudding user response:
You should run pd.read_sql in an another Thread, you can use this utility functions:
import time
from threading import Thread
class ThreadWithReturnValue(Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs={}, Verbose=None):
Thread.__init__(self, group, target, name, args, kwargs)
self._return = None
def run(self):
print(type(self._target))
if self._target is not None:
self._return = self._target(*self._args,
**self._kwargs)
def join(self, *args):
Thread.join(self, *args)
return self._return
def call(f, *args, timeout = 5, **kwargs):
i = 0
t = ThreadWithReturnValue(target=f, args=args, kwargs=kwargs)
t.daemon = True
t.start()
while True:
if not t.is_alive():
break
if timeout == i:
print("timeout")
return
time.sleep(1)
i = 1
return t.join()
def read_sql(a,b,c, sql="", con=""):
print(a, b, c, sql, con)
t = 10
while t > 0:
# print("t=", t)
time.sleep(1)
t -= 1
return "read_sql return value"
conn = "conn"
t = "t"
print(call(read_sql, "a", "b", "c", timeout=10, sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn))
I got help from this answer.
by those functions:
for t in tablelist:
df = call(pd.read_sql, timeout=yourTimeOutInSeconds , sql=f''' select * from [DB].[SCHEMA].[{t}] ''', con=conn)
if df:
df.to_csv(path, index=None)