I am having a hard time to get this done:
What I have: pandas dataframe:
product seller price
A Yo 10
A Ka 5
A Poy 7.5
A Nyu 2.5
A Poh 1.25
B Poh 11.25
What I want:
given a df like the one above product, seller, price
I wan to calculate 4 quartiles based on price's column for that particulary product and classify the price of each seller of that product into these quartiles.
When all prices are the same, the 4 quartiles has the same value and the price is classified as 1st quartile
Expected Outuput:
product seller price Quartile 1Q 2Q 3Q 4Q
A Yo 10 4 2.5 5 7.5 10
A Ka 5 2 2.5 5 7.5 10
A Poy 7.5 3 2.5 5 7.5 10
A Nyu 2.5 1 2.5 5 7.5 10
A Poh 1.25 1 2.5 5 7.5 10
B Poh 11.25 1 11.25 11.25 11.25 11.25
What I did so far:
if I use: df['Price'].quantile([0.25,0.5,0.75,1])
it will claculate 4 quartiles of all prices without filter by product, so its wrong.
I am lost because I dont know how to do this in python.
Can anyone give me some light here?
Thanks
CodePudding user response:
Try:
dfQuantile = df.groupby("product")['Price'].quantile([0.25,0.5,0.75,1]).unstack().reset_index().rename(columns={0.25:"1Q",0.5:"2Q",0.75:"3Q",1:"4Q"})
out = pd.merge(df,dfQuantile,on="product",how="left")
out["Quantile"] = df.groupby(['product'])['Price'].transform(
lambda x: pd.qcut(x, 4, labels=False, duplicates="drop")).fillna(0).add(1)
print(out)
product seller Price Quantile 1Q 2Q 3Q 4Q
0 A Yo 10.00 4 2.50 5.00 7.50 10.00
1 A Ka 5.00 2 2.50 5.00 7.50 10.00
2 A Poy 7.50 3 2.50 5.00 7.50 10.00
3 A Nyu 2.50 1 2.50 5.00 7.50 10.00
4 A Poh 1.25 1 2.50 5.00 7.50 10.00
5 B Poh 11.25 1 11.25 11.25 11.25 11.25