I have a table of products and their sold amounts in dollars. I have the total sell and want to know which products have made 80 percent of the whole sell and mark them 1 in the label column. Please make sure to label 1 from largest number to the smallest one. Below the total sell is 32 which 80 percent of it is 25.6. So from the largest number in sold$ column to the smallest one if we add rows 2,4,5, and 7 it will be 26 which makes the 80 percent of the total sell, 32, and label them 1 and the others 0. I want to do it with python and pandas. Thank you in advance. Best regards
CodePudding user response:
Calculate the fraction of sales for each product, sort them by fraction, calculate their cumulative sum and use this to get the top 80%
cumsum = (df["sold"]/df["sold"].sum()).sort_values().cumsum()
df["label"] = pd.Series(0, index=cumsum.index).where(cumsum <= 0.2, 1)
CodePudding user response:
You can do with:
import pandas as pd
import numpy as np
data = {'productID':[1,2,3,4,5,6,7],'sold$':[2,4,3,8,5,1,9]}
df=pd.DataFrame(data)
df.sort_values('sold$',inplace=True)
df['Label']=np.where(df['sold$'].cumsum()<=df['sold$'].sum() * 0.2,0,1)
df.sort_index(inplace=True)
print (df)
result:
productID sold$ Label
0 1 2 0
1 2 4 1
2 3 3 0
3 4 8 1
4 5 5 1
5 6 1 0
6 7 9 1