Use data frame to create multiple excels/csv-CodePudding

I am new to python programming, and I am trying to think how to create multiple excels from a data frame. I have a Pandas data frame as shown below,

Invoice No.	Voucher ID	Claimed
MHI000000038710	100039	Yes
MHI000000038715	100039	No
MHI000000038711	100043	Yes
MHI000000038712	100043	No

For every Voucher ID, I am trying to create excels for every Invoice No. from data frame where Claimed = Yes. But every excel should contain all the rows for same Voucher ID irrespective of Claimed is Yes/No. For the above example, the output would be 2 excels named after each Invoice No. where Claimed = Yes(i.e., MHI000000038710and MHI000000038711).

And each excel should contains rows grouped by Voucher ID

Excel 1(MHI000000038710.xlsx): -

Invoice No.	Voucher ID	Claimed
MHI000000038710	100039	Yes
MHI000000038715	100039	No

Excel 2(MHI000000038711.xlsx): -

Invoice No.	Voucher ID	Claimed
MHI000000038711	100043	Yes
MHI000000038712	100043	No

The script that I have currently is creating excel for all the invoices. I am not able to figure out how not create invoices where Claimed = No but still have all the rows in the invoices created for Claimed = Yes

for myid in df['Voucher ID'].unique():
    df_singleID = df[df['Voucher ID']==myid]
    for myinvoice in df_singleID['Invoice No'].unique():
        df_singleID.to_excel(output "\\" str(myinvoice) '.xlsx',index = False)

CodePudding user response：

What about using pandas.DataFrame.groupby ?

for _, g in df.groupby("Voucher ID"):
    filename = g.loc[g["Claimed"].eq("Yes"), "Invoice No."].iloc[0]
    g.to_excel(f"{filename}.xlsx", index=False)

NB: The filename is calculated with boolean indexing.

CodePudding user response：

You wont get the index error using .query() and list comps:

from itertools import zip_longest

import pandas as pd


path = "path/to/file"
invoices = df.query("Claimed.eq('Yes')")["Invoice No."].tolist()
dfs = [x.reset_index(drop=True) for _, x in df.groupby("Voucher ID")]

[df.to_excel(f"{path}/{file_name}.xlsx", index=False) for file_name, df in zip_longest(invoices, dfs)]