Home > front end >  Pandas filter data to multiple excel files
Pandas filter data to multiple excel files

Time:06-30

I have dataframe like as below

import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                   'customer': rng.choice(list('ACD'),size=(5)),
                   'region': rng.choice(list('PQRS'),size=(5)),
                   'dumeel': rng.choice(list('QWER'),size=(5)),
                   'dumma': rng.choice((1234),size=(5)),
                   'target': rng.choice([0,1],size=(5))
})

I would like to do the below

a) filter the data based on customer field

b) store each subset in a seperate .xlsx with each file carrying the customer name as file name (there should be 3 excel files because we have 3 unique customers which are A, C and D)

I tried the below

for cust, v in cdf.groupby(['customer']):
    writer = pd.ExcelWriter(f"{cust}.xlsx", engine='xlsxwriter')
    v.to_excel(writer, sheet_name=f'{cust}')
    writer.close()

Though files are generated, I get the below error message

Excel cannot open the file A.xlsx because the file format or extension is not valid. Verify that file has been corrupted and that file extension matches the format of the file

How can I avoid the error and store data for each customer in seperate excel file with both file and sheet name having customer name

CodePudding user response:

Your code works fine but you can try 2 other solutions: change the engine using engine='openpyxl' or use cdf.to_excel:

for cust, v in cdf.groupby(['customer']):
    v.to_excel(f"{cust}.xlsx", sheet_name=f"{cust}")
  • Related