Home > Back-end >  Download chunk of the large file using pysftp in python
Download chunk of the large file using pysftp in python

Time:12-22

I have one use case in which I want to read only top 5 rows of a large CSV file which is present in one of my sftp server and I don't want to download the complete file to just read the top 5 rows. I am using pysftp in python3 to interact with my SFTP server. Do we have any way in which I can download only the chunk of the file instead of downloading the complete file in pysftp.

If there are any other libraries in python or any technique I can use, please guide me. Thanks

CodePudding user response:

Yes, it is possible to download only a portion of a file from an SFTP server using pysftp. One way to do this is to use the getfo method, which allows you to download a file and write its contents to a file-like object. You can use this method in combination with the io module's StringIO class, which allows you to create a file-like object in memory that you can read from and write to.

Here is an example of how you might use these methods to download the first 5 lines of a CSV file from an SFTP server:

 import pysftp
import io

# Connect to the SFTP server
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
with pysftp.Connection('sftp.example.com', username='user', password='pass', cnopts=cnopts) as sftp:
# Open the CSV file on the SFTP server
with sftp.open('path/to/file.csv', 'r') as f:
    # Create a file-like object in memory
    output = io.StringIO()
    # Download the first 5 lines of the file and write them to the file-like object
    for i in range(5):
        line = f.readline()
        output.write(line)
    # Reset the file pointer to the beginning of the file-like object
    output.seek(0)
    # Read the contents of the file-like object
    print(output.read())

This example reads the first 5 lines of the file and writes them to a file-like object in memory. You can then read the contents of the file-like object using the read method, or you can process the lines in any other way that you like

CodePudding user response:

First, do not use pysftp. It's dead unmaintained project. Use Paramiko instead. See pysftp vs. Paramiko.

If you want to read data from specific point in the file, you can open a file-like object representing the remote file using Paramiko SFTPClient.open method (or equivalent pysftp Connection.open) and then use it as if you were accessing data from any local file:

  • Use .seek to set read pointer to the desired offset.
  • Use .read to read data.
with sftp.open("/remote/path/file", "r", bufsize=32768) as f:
    f.seek(offset)
    data = f.read(count)

For the purpose of bufsize, see:
Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow

  • Related