Sum of only unique Date rows-CodePudding

I have a data frame like that

week	co_week	Revenue	Country
19/09/2021	01/10/2021	120	US
19/09/2021	03/10/2021	120	US
19/09/2021	03/10/2021	120	FR
20/09/2021	05/10/2021	15	US
20/09/2021	06/10/2021	34	FR

Now I'd like to get the sum of Revenue for each unique week date , so the output should look like that

week	co_week	Revenue	Total_revenue	Country
19/09/2021	01/10/2021	120	135	US
19/09/2021	03/10/2021	120	135	US
19/09/2021	03/10/2021	120	154	FR
20/09/2021	05/10/2021	15	135	US
20/09/2021	06/10/2021	34	154	FR

I tried using the following code , but I ended with a lot of Null values in the new column due to the dropping of most indexes!

df['Total_revenue'] = df.drop_duplicates(['Revenue']).groupby(df['Country'])['Total_spend'].transform('sum')

CodePudding user response：

IIUC, you might want to drop_duplicates in the groupby:

df['Total_revenue'] = (df.groupby('Country')['Revenue']
                         .transform(lambda x: x.drop_duplicates().sum())
                       )

But this might be biased if you have several times the same revenue on different weeks by chance.

Better use a mask to hide the duplicated values, then groupby.transform('sum'):

df['Total_revenue'] = (df['Revenue']
                       .mask(df.duplicated(['week', 'Country']))
                       .groupby(df['Country']).transform('sum')
                       )

output:

         week     co_week  Revenue Country  Total_revenue
0  19/09/2021  01/10/2021      120      US            135
1  19/09/2021  03/10/2021      120      US            135
2  19/09/2021  03/10/2021      120      FR            154
3  20/09/2021  05/10/2021       15      US            135
4  20/09/2021  06/10/2021       34      FR            154