I am trying to calculate the lower rate per transaction of items. Here, I would like to find the lower rate corresponding to the given value.
sample code
import numpy as np
import pandas as pd
df = pd.DataFrame({
'id': [0,0,1,1,1,2,2],
'min':[100,500,200,300,1000,2000,3000],
'rate':[0,0.5,1.5,2,3, 1.4,2.4],
'value':[50,450,200,250,1100,500,2500],
'answer': [0,0.0,1.5,1.5,3.0,0,1.4],
'reason': ['50<100', '450<500','200 has rate 1.5','250<300',
'1100>1000','500<3000','2500<3000']
})
df
id min rate value answer reason
0 0 100 0.00 50 0.00 for id0, value is 50, but min rate below 100 is 0.0 so answer is 0.0
1 0 500 0.50 450 0.00 for id0, value is 450, if value is less than 500 and greater than 100 rate is still 0.0 so answer is 0.0
2 1 200 1.50 200 1.50 for id1, value is 200, for which rate is 1.5
3 1 300 2.00 250 1.50 for id1, value 250 is less than 300 and so rate is 1.5
4 1 1000 3.00 1100 3.00 for id1, value 1100 > 1000 so rate is 3.0
5 2 2000 1.40 500 0.00 for id2, value 500<2000 so rate is 0.0
6 2 3000 2.40 2500 1.40 for id2, value 2000<2500<3000 so rate is 1.4
Question: how to get the answer column?
My attempt
df.groupby('id').apply(lambda dfx: dfx['min'].min())
How to find the column "answer" in above dataframe. I am thinking of pandas.cut and am not been able to get the logic correct.
CodePudding user response:
IIUC, you can use pandas.cut
per group:
cut = lambda g: pd.cut(g['value'], bins=list(g['min']) [float('inf')],
labels=g['rate'], right=False).astype(float).fillna(0)
df['answer'] = df.groupby('id', group_keys=False).apply(cut)
output (new column as answer2 for clarity):
id min rate value answer reason answer2
0 0 100 0.0 50 0.0 50<100 0.0
1 0 500 0.5 450 0.0 450<500 0.0
2 1 200 1.5 200 1.5 200 has rate 1.5 1.5
3 1 300 2.0 250 1.5 250<300 1.5
4 1 1000 3.0 1100 3.0 1100>1000 3.0
5 2 2000 1.4 500 0.0 500<3000 0.0
6 2 3000 2.4 2500 1.4 2500<3000 1.4