Home > Net >  Sum of only unique Date rows
Sum of only unique Date rows

Time:09-20

I have a data frame like that

week co_week Revenue Country
19/09/2021 01/10/2021 120 US
19/09/2021 03/10/2021 120 US
19/09/2021 03/10/2021 120 FR
20/09/2021 05/10/2021 15 US
20/09/2021 06/10/2021 34 FR

Now I'd like to get the sum of Revenue for each unique week date , so the output should look like that

week co_week Revenue Total_revenue Country
19/09/2021 01/10/2021 120 135 US
19/09/2021 03/10/2021 120 135 US
19/09/2021 03/10/2021 120 154 FR
20/09/2021 05/10/2021 15 135 US
20/09/2021 06/10/2021 34 154 FR

I tried using the following code , but I ended with a lot of Null values in the new column due to the dropping of most indexes!

df['Total_revenue'] = df.drop_duplicates(['Revenue']).groupby(df['Country'])['Total_spend'].transform('sum')

CodePudding user response:

IIUC, you might want to drop_duplicates in the groupby:

df['Total_revenue'] = (df.groupby('Country')['Revenue']
                         .transform(lambda x: x.drop_duplicates().sum())
                       )

But this might be biased if you have several times the same revenue on different weeks by chance.

Better use a mask to hide the duplicated values, then groupby.transform('sum'):

df['Total_revenue'] = (df['Revenue']
                       .mask(df.duplicated(['week', 'Country']))
                       .groupby(df['Country']).transform('sum')
                       )

output:

         week     co_week  Revenue Country  Total_revenue
0  19/09/2021  01/10/2021      120      US            135
1  19/09/2021  03/10/2021      120      US            135
2  19/09/2021  03/10/2021      120      FR            154
3  20/09/2021  05/10/2021       15      US            135
4  20/09/2021  06/10/2021       34      FR            154
  • Related