for remote_path in list_of_stfp_paths:
with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
sftp.get(remote_path, str(local_path))
#checks distinct count of a column for the csv downloaded, deletes it later
df = pd.read_csv(str(local_path))
print(df['taken_time'].value_counts())
os.remove(str(local_path))
The code I use is above. It's just run in a for loop with multiple remote paths. Sometimes, it gets completed. Sometimes, I get an error saying
Exception: Authentication failed.
CodePudding user response:
Do not reconnect for each file. Loop the downloads only, not the connection:
with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
for remote_path in list_of_stfp_paths:
sftp.get(remote_path, str(local_path))
#checks distinct count of a column for the csv downloaded, deletes it later
df = pd.read_csv(str(local_path))
print(df['taken_time'].value_counts())
os.remove(str(local_path))
Though note that you do not even have to download the files to a local disk, just read them straight from the SFTP server:
with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
for remote_path in list_of_stfp_paths:
with sftp.open(remote_path) as f:
f.prefetch()
#checks distinct count of a column for the csv
df = pd.read_csv(f)
print(df['taken_time'].value_counts())
It might even be faster as it allows the download and parsing happen in parallel, not in sequence. See Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back
CodePudding user response:
you could simply try a try/except block
try:
with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
sftp.get(remote_path, str(local_path))
except Exception as e:
print(e)
this will catch the error and it will also print it on the console without shutting the process down