When I try to read CSV from sftp
server with read_csv()
there is no problem at all. But when I try to read xlsx
with read_excel()
it never finishes reading file. I tried downloading from sftp and reading xlsx
. It works well. But I didn't understand why I can't read from sftp
server. Is there any solutions?
t = paramiko.Transport(('ip', port))
t.connect(username='user', password='pass')
sftp = paramiko.SFTPClient.from_transport(t)
fileattr = sftp.lstat(remote_file)
try:
if stat.S_ISREG(fileattr.st_mode):
if '.xlsx' in remote_file:
with sftp.open(remote_file) as f:
f.prefetch()
df = pd.read_excel(f,dtype=str,engine='openpyxl',sheet_name=sheet)
t.close()
print(df)
except Exception as e:
t.close()
print(e)
CodePudding user response:
Use reas_excel(f.read())
instead of just read_excel(f)
. read_excel()
only reads objects with a read()
method or file paths.
For the problem of more columns than expected there are 3 options:
- Modify your excel file so that you do not have absolutely nothing in the blanck cells, not even any kind format. And even though, this method is unpredictable.
- Read the excel into a dataframe and then filter the columns that you need. Or drop the empty columns like
df.dropna(axis='columns', inplace=True)
- Use the
usecols
parameter fromread_excel
likeread_excel(f.read(), usecols="A,C,E:F", ...)
. See documentation
I would recommend option 3.