Home > Blockchain >  Why read_excel() can't read xlsx from sftp server?
Why read_excel() can't read xlsx from sftp server?

Time:08-12

When I try to read CSV from sftp server with read_csv() there is no problem at all. But when I try to read xlsx with read_excel() it never finishes reading file. I tried downloading from sftp and reading xlsx. It works well. But I didn't understand why I can't read from sftp server. Is there any solutions?

t = paramiko.Transport(('ip', port))
t.connect(username='user', password='pass')
sftp = paramiko.SFTPClient.from_transport(t)
fileattr = sftp.lstat(remote_file)
try:
    if stat.S_ISREG(fileattr.st_mode):
        if '.xlsx' in remote_file:
            with sftp.open(remote_file) as f:
                f.prefetch()
                df = pd.read_excel(f,dtype=str,engine='openpyxl',sheet_name=sheet)
                t.close()
                print(df)
except Exception as e:
    t.close()
    print(e)

CodePudding user response:

Use reas_excel(f.read()) instead of just read_excel(f). read_excel() only reads objects with a read() method or file paths.

For the problem of more columns than expected there are 3 options:

  1. Modify your excel file so that you do not have absolutely nothing in the blanck cells, not even any kind format. And even though, this method is unpredictable.
  2. Read the excel into a dataframe and then filter the columns that you need. Or drop the empty columns like df.dropna(axis='columns', inplace=True)
  3. Use the usecols parameter from read_excel like read_excel(f.read(), usecols="A,C,E:F", ...). See documentation

I would recommend option 3.

  • Related