Home > Software engineering >  How to avoid - Download fails when we try to download 50 files from sftp serially using pysftp in p
How to avoid - Download fails when we try to download 50 files from sftp serially using pysftp in p

Time:09-30

for remote_path in list_of_stfp_paths:
    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        sftp.get(remote_path, str(local_path))

    #checks distinct count of a column for the csv downloaded, deletes it later
    df = pd.read_csv(str(local_path))
    print(df['taken_time'].value_counts())
    os.remove(str(local_path))

The code I use is above. It's just run in a for loop with multiple remote paths. Sometimes, it gets completed. Sometimes, I get an error saying Exception: Authentication failed.

CodePudding user response:

Do not reconnect for each file. Loop the downloads only, not the connection:

with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
    for remote_path in list_of_stfp_paths:
        sftp.get(remote_path, str(local_path))

        #checks distinct count of a column for the csv downloaded, deletes it later
        df = pd.read_csv(str(local_path))
        print(df['taken_time'].value_counts())
        os.remove(str(local_path))

Though note that you do not even have to download the files to a local disk, just read them straight from the SFTP server:

with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
    for remote_path in list_of_stfp_paths:
        with sftp.open(remote_path) as f:
            f.prefetch()
            #checks distinct count of a column for the csv
            df = pd.read_csv(f)
            print(df['taken_time'].value_counts())

It might even be faster as it allows the download and parsing happen in parallel, not in sequence. See Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back

CodePudding user response:

you could simply try a try/except block

try:
    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        sftp.get(remote_path, str(local_path))

except Exception as e:
    print(e)

this will catch the error and it will also print it on the console without shutting the process down

  • Related