I need to convert a dataframe to an excel file purely using memory, i.e. there cannot be any reading / writing of files locally. Because of this, I am unable to do something like df.to_excel('filename.xls')
, or use with open('filename.xls', 'rb') as f:
This is because the script is to be run in a place where files cannot be read/written locally. Once I have the excel file in memory, I will send it as an email attachment using sendgrid.
Currently, my idea would be to convert the dataframe into binary with the help of python's byteIO, and then convert this binary file into excel. With the help of some other stackoverflow resources, I have managed to come up with the following ideas:
#In this case, I convert the df to binary.
memfile = io.BytesIO(df.to_json().encode())
memfile.seek(0)
# Now, memfile is a binary file. But from here, I am not sure how to convert this memfile to excel.
#Another solution I found with SOF, which uses df.to_excel along with the file in memory.
in_memory_fp = io.BytesIO()
df.to_excel(in_memory_fp)
in_memory_fp.seek(0,0)
# Write the file out to disk to demonstrate that it worked.
print(in_memory_fp.read())
# In this case, the print message that comes out looks nothing like an excel file, but just many alphabets/letters. I am not sure if this is an excel format.
I appreciate and comments and guidance to point me in the right direction. References: How to create in-memory file object Convert bytes to a file object in python Convert file into BytesIO object using python
CodePudding user response:
the print message that comes out looks nothing like an excel file
xlsx
is a zipped format, so it's a binary file and there's no point printing it.
For testing, you can just check a round-trip:
import io
import pandas as pd
df = pd.DataFrame([[1,2], [3,4]])
in_memory_fp = io.BytesIO()
df.to_excel(in_memory_fp, index=False)
in_memory_fp.seek(0,0)
pd.testing.assert_frame_equal(df, pd.read_excel(in_memory_fp))