I have a dataframe with sales quantity for a list of products. Each product is assigned a design/range name. Within each design, there may be multiple products. How can I perform calculations within only a certain design to find the sales split? I want to find out what percentage of a given range come from a certain product. I have so far only been able to take the entire list of products and calculate the percentage each contributes to the overall sales quantity.
Original datafram:
id Product Range Quantity
1 Prod1 A 6
2 Prod2 A 4
3 Prod3 B 2
4 Prod4 B 8
Dataframe after calculation:
id Product Range Quantity % of range
1 Prod1 A 6 60%
2 Prod2 A 4 40%
3 Prod3 B 2 20%
4 Prod4 B 8 80%
CodePudding user response:
You need a simple groupby.transform('sum')
to get the total per group, then perform classical vector arithmetic.
I provided an example as float and one as string:
total = df.groupby('Range')['Quantity'].transform('sum')
# as float
df['% of range'] = df['Quantity'].div(total)
# as string
df['% of range (str)'] = df['Quantity'].div(total).mul(100).astype(int).astype(str) ' %'
output:
id Product Range Quantity % of range % of range (str)
0 1 Prod1 A 6 0.6 60 %
1 2 Prod2 A 4 0.4 40 %
2 3 Prod3 B 2 0.2 20 %
3 4 Prod4 B 8 0.8 80 %
CodePudding user response:
Edit: You should see mozway's solution, because mine is basically doing the same thing in more steps ; I didn't know about .transform
which does in a single, straight-forward line what I'm doing in 2.
To select only lines wich satisfy a particular condition (e.g. have a value of Product
egal to Prod1
) :
df[df.Product == "Prod1]
Thus to get the sum of the quantity for prod1 regardless of the Range, you would do :
df[df.Product == "Prod1"]["Quantity"].sum()
If you want to do the same for every value, use groupby :
sum_per_range = df[["Range", "Quantity"]].groupby("Range").sum()
Gives you the total quantity per Range, now we need to create a new column that will use these values :
df["%"] = df.apply(lambda x: x.Quantity/sum_per_range[x.Range],
axis=1