I have the below df
typeItem store region amountItems
0 Drink Toncat Colorado 100
1 Food Toncat Colorado 100
2 Toy Toncat Colorado 100
3 Cloth Toncat Colorado 100
4 Car Bent Mount 10
5 Moto Bent Mount 10
6 Bike Timo Mount 5
7 Fork Logo Iggo 50
1 region has many stores, and the amountItems value is already agregatted for each store,I'm trying to group and sum the amount of items per region considering only store
the output should be this:
typeItem store region amountItems amountItemsRegion
0 Drink Toncat Colorado 100 100
1 Food Toncat Colorado 100 100
2 Toy Toncat Colorado 100 100
3 Cloth Toncat Colorado 100 100
4 Car Bent Mount 10 15
5 Moto Bent Mount 10 15
6 Bike Timo Mount 5 15
7 Fork Logo Iggo 50 50
As can be seen above, the Region: "Colorado" has many items(rows in the df), but only one store, so, there is nothing to sum, only group Different than the Region:"Mount", which has 2 stores(but one appears twice), so I need to group them and sum(10 5)
Could you guys help me?
CodePudding user response:
You can do drop_duplicates
then groupby
assign it back
df['new'] = df.drop_duplicates(['region','store']).groupby('region')['amountItems'].sum().reindex(df['region']).values
df
Out[1046]:
typeItem store region amountItems new
0 Drink Toncat Colorado 100 100
1 Food Toncat Colorado 100 100
2 Toy Toncat Colorado 100 100
3 Cloth Toncat Colorado 100 100
4 Car Bent Mount 10 15
5 Moto Bent Mount 10 15
6 Bike Timo Mount 5 15
7 Fork Logo Iggo 50 50