Home > database >  How to split the dataframe and store it in multiple sheets of a excel file
How to split the dataframe and store it in multiple sheets of a excel file

Time:02-22

I have a dataframe like as shown below

import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                   'customer': rng.choice(list('ACD'),size=(5)),
                   'region': rng.choice(list('PQRS'),size=(5)),
                   'dumeel': rng.choice(list('QWER'),size=(5)),
                   'dumma': rng.choice((1234),size=(5)),
                   'target': rng.choice([0,1],size=(5))
})

I would like to do the below

a) extract the data for unique combination of region and customer. Meaning groupby.

b) store them in each sheet of one excel file (based on number of groups)

I was trying something like below but there should be some neat pythonic way to do this

df_list = []
grouped = cdf.groupby(['customer','region'])
for k,v in grouped:
    for i in range(len(k)):
        df = cdf[(cdf['customer']==k[i] & cdf['region']==k[i 1])]
        df_list.append(df)

I expect my output to be like below (showing in multiple screenshots).

As my real data has 200 columns and million rows, any efficient and elegant approach would really be helpful

enter image description here

enter image description here

enter image description here

CodePudding user response:

Use this solution in loop:

writer = pd.ExcelWriter('out.xlsx', engine='xlsxwriter')
    
for (cust, reg), v in cdf.groupby(['customer','region']):
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}")
        
    # Close the Pandas Excel writer and output the Excel file.
writer.save()
  • Related