Home > Net >  Total sum by Unique ID using df.groupby
Total sum by Unique ID using df.groupby

Time:10-31

In the data set I'm working on, each unique ID has a Total Pay. I've summed this Total pay up per ID by using

df['Total Courier Pay'] = df.groupby(['ID'])['Total Pay'].transform(sum)
df

The output in the excel sheet looks like this:

enter image description here

I'm happy to keep all the columns shown but is there any way to clean it up so it looks more like this instead of having duplicate rows for each ID?:

ID      Total Courier Pay
1          5.1615
2          59.1075
3          7.4925

Is it possible to create a new sheet in the same workbook where it only displays the ID and Total Courier Pay?

Thanks in advance!

CodePudding user response:

df['Total Courier Pay'] = df.groupby(['ID'])['Total Pay'].transform(sum)
new_df = df.drop_duplicates(subset=['ID'])[['ID','Total Courier Pay']]

CodePudding user response:

After some help from @Pedro Maia and reading some other forums, this seemed to work for me:

df.drop_duplicates('ID', inplace=True)

Thanks for the help :) - This will change the current excel sheet rather than create a new one.

CodePudding user response:

According to your example, it seems that you want to create a separate DataFrame with only the 'ID' and 'Total Courier Pay'. In that case, you can do

total_pay_by_ID = (
    df.groupby('ID', as_index=False)['Total Pay']
      .sum()
      .rename(columns={'Total Pay': 'Total Courier Pay'})
)

Is it possible to create a new sheet in the same workbook where it only displays the ID and Total Courier Pay?

To save it to a new sheet 'sheet_name' of an existing excel file 'path/to/excel/file.xlsx' you can do

with pd.ExcelWriter('path/to/excel/file.xlsx', mode='a') as writer:  
    total_pay_by_ID.to_excel(writer, sheet_name='sheet_name')
  • Related