Home > Enterprise >  Sum one column based on matching values in second column and put total into correct record in anothe
Sum one column based on matching values in second column and put total into correct record in anothe

Time:06-24

I would like to ask for some guidance with a project. The below dataframe scripts are the before and after for your review.


What is listed below is a hotel list of guests, Rates, & Status.

The ID column is for tracking each person staying with the hotel. The Rate column list the rate per person.

The Status column is "P" for primary and "S" for shared.

So the overview is the guests with matching ID numbers are staying in a room together, both rates should be summed together, but listed under the "P"(primary) guest record in the Total column. In the Total column you should have for the "P"'s a total of the two guests staying together and the "S"(shared) should be zero.

I have tried the pandas groupby & sum but this snippet is removing some of the matching ID's records. The Sum is working for my totals, but I still need to figure out how to put the total under the primary guest record. I am still reviewing stackoverflow for likeness solutions that could help. df = df.groupby(["ID"]).Rate.sum().reset_index()

I would appreciate any help that you can give. Thank You

import pandas as pd

print('Before')
df=pd.DataFrame({'ID':[1182,2554,1182,2658,5489,2658],
'Fname':['Tom','Harry','Trisha','Ben','Susan','Brenda'],
'Rate':[125.00,89.00,135.00,25.00,145.00,19.00],
'Status':['P','S','P','P','P','S'],
'Total':[0,0,0,0,0,0,]})


df1=pd.DataFrame({'ID':[1182,1182,2658,2658,2554,5489,],
'Fname':['Tom','Trisha','Ben','Brenda','Harry','Susan'],
'Rate':[125.00,135.00,245.00,19.00,89.00,25.00],
'Status':['P','S','P','S','P','P'],
'Total':[260.00,0,264,0,89.00,25.00]})

print(df)
print()
print('After')
print(df1)

CodePudding user response:

Assuming you have a unique P per group, you can use a GroupBy.transform with a mask:

df['Total'] = df.groupby('ID')['Rate'].transform('sum').where(df['Status'].eq('P'), 0)

output (using the data from df1):

     ID   Fname   Rate Status  Total
0  1182     Tom  125.0      P  260.0
1  1182  Trisha  135.0      S    0.0
2  2658     Ben  245.0      P  264.0
3  2658  Brenda   19.0      S    0.0
4  2554   Harry   89.0      P   89.0
5  5489   Susan   25.0      P   25.0
  • Related