How to perform calculation on certain rows in pandas?-CodePudding

I have a dataframe with sales quantity for a list of products. Each product is assigned a design/range name. Within each design, there may be multiple products. How can I perform calculations within only a certain design to find the sales split? I want to find out what percentage of a given range come from a certain product. I have so far only been able to take the entire list of products and calculate the percentage each contributes to the overall sales quantity.

Original datafram:

id  Product  Range  Quantity
1   Prod1      A    6        
2   Prod2      A    4         
3   Prod3      B    2         
4   Prod4      B    8

Dataframe after calculation:

id  Product  Range  Quantity  % of range
1   Prod1      A    6             60%
2   Prod2      A    4             40%
3   Prod3      B    2             20%
4   Prod4      B    8             80%

CodePudding user response：

You need a simple groupby.transform('sum') to get the total per group, then perform classical vector arithmetic.

I provided an example as float and one as string:

total = df.groupby('Range')['Quantity'].transform('sum')

# as float
df['% of range'] = df['Quantity'].div(total)

# as string
df['% of range (str)'] = df['Quantity'].div(total).mul(100).astype(int).astype(str)   ' %'

output:

   id Product Range  Quantity  % of range % of range (str)
0   1   Prod1     A         6         0.6             60 %
1   2   Prod2     A         4         0.4             40 %
2   3   Prod3     B         2         0.2             20 %
3   4   Prod4     B         8         0.8             80 %

CodePudding user response：

Edit: You should see mozway's solution, because mine is basically doing the same thing in more steps ; I didn't know about .transform which does in a single, straight-forward line what I'm doing in 2.

To select only lines wich satisfy a particular condition (e.g. have a value of Product egal to Prod1) :

df[df.Product == "Prod1]

Thus to get the sum of the quantity for prod1 regardless of the Range, you would do :

df[df.Product == "Prod1"]["Quantity"].sum()

If you want to do the same for every value, use groupby :

sum_per_range = df[["Range", "Quantity"]].groupby("Range").sum()

Gives you the total quantity per Range, now we need to create a new column that will use these values :

df["%"] = df.apply(lambda x: x.Quantity/sum_per_range[x.Range],
                   axis=1