Say we have a dataframe like the one below:
channel store units
Offline Bournemouth 62
Offline Kettering 90
Offline Manchester 145
Online Bournemouth 220
Online Kettering 212
Online Manchester 272
My purpose would be to add two more columns containing the full amount of units sold for each channel and the share each store represents within each one. In short, the output I desire to reach should look as follows:
channel store units units_per_channel store_share
Offline Bournemouth 62 297 0.21
Offline Kettering 90 297 0.30
Offline Manchester 145 297 0.49
Online Bournemouth 220 704 0.31
Online Kettering 212 704 0.30
Online Manchester 272 704 0.39
Is there any simple and elegant way to get this?
CodePudding user response:
Do a .grouby()
on the channel
, and get the sum of the units
. Then simply divide the units
by units_per_channel
import pandas as pd
df = pd.DataFrame([['Offline', 'Bournemouth', 62],
['Offline' , 'Kettering' , 90],
['Offline' , 'Manchester' , 145],
['Online' , 'Bournemouth', 220],
['Online' , 'Kettering', 212],
['Online' , 'Manchester', 272]],
columns=['channel','store','units'],)
df['units_per_channel'] = df.groupby('channel')['units'].transform('sum')
df['store_share'] = df['units'] / df['units_per_channel']
Output:
print(df)
channel store units units_per_channel store_share
0 Offline Bournemouth 62 297 0.208754
1 Offline Kettering 90 297 0.303030
2 Offline Manchester 145 297 0.488215
3 Online Bournemouth 220 704 0.312500
4 Online Kettering 212 704 0.301136
5 Online Manchester 272 704 0.386364