Home > Blockchain >  Python, Pandas: mark top down rows making 80 percent of total sell
Python, Pandas: mark top down rows making 80 percent of total sell

Time:10-24

I have a table of products and their sold amounts in dollars. I have the total sell and want to know which products have made 80 percent of the whole sell and mark them 1 in the label column. Please make sure to label 1 from largest number to the smallest one. Below the total sell is 32 which 80 percent of it is 25.6. So from the largest number in sold$ column to the smallest one if we add rows 2,4,5, and 7 it will be 26 which makes the 80 percent of the total sell, 32, and label them 1 and the others 0. I want to do it with python and pandas. Thank you in advance. Best regards enter image description here

CodePudding user response:

Calculate the fraction of sales for each product, sort them by fraction, calculate their cumulative sum and use this to get the top 80%

cumsum = (df["sold"]/df["sold"].sum()).sort_values().cumsum()
df["label"] = pd.Series(0, index=cumsum.index).where(cumsum <= 0.2, 1)

CodePudding user response:

You can do with:

import pandas as pd
import numpy as np

data = {'productID':[1,2,3,4,5,6,7],'sold$':[2,4,3,8,5,1,9]}

df=pd.DataFrame(data)
df.sort_values('sold$',inplace=True)

df['Label']=np.where(df['sold$'].cumsum()<=df['sold$'].sum() * 0.2,0,1)
df.sort_index(inplace=True)


print (df)

result:

   productID  sold$  Label
0          1      2      0
1          2      4      1
2          3      3      0
3          4      8      1
4          5      5      1
5          6      1      0
6          7      9      1
  • Related