I have dataframe like as below
import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
'customer': rng.choice(list('ACD'),size=(5)),
'region': rng.choice(list('PQRS'),size=(5)),
'dumeel': rng.choice(list('QWER'),size=(5)),
'dumma': rng.choice((1234),size=(5)),
'target': rng.choice([0,1],size=(5))
})
I would like to do the below
a) filter the data based on customer
field
b) store each subset in a seperate .xlsx
with each file carrying the customer name as file name (there should be 3 excel files because we have 3 unique customers which are A, C and D)
I tried the below
for cust, v in cdf.groupby(['customer']):
writer = pd.ExcelWriter(f"{cust}.xlsx", engine='xlsxwriter')
v.to_excel(writer, sheet_name=f'{cust}')
writer.close()
Though files are generated, I get the below error message
Excel cannot open the file A.xlsx because the file format or extension is not valid. Verify that file has been corrupted and that file extension matches the format of the file
How can I avoid the error and store data for each customer in seperate excel file with both file and sheet name having customer name
CodePudding user response:
Your code works fine but you can try 2 other solutions: change the engine using engine='openpyxl'
or use cdf.to_excel
:
for cust, v in cdf.groupby(['customer']):
v.to_excel(f"{cust}.xlsx", sheet_name=f"{cust}")