Home > Software engineering >  How to find the lower value of groupby per group in pandas?
How to find the lower value of groupby per group in pandas?

Time:03-24

I am trying to calculate the lower rate per transaction of items. Here, I would like to find the lower rate corresponding to the given value.

sample code

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'id': [0,0,1,1,1,2,2],
    'min':[100,500,200,300,1000,2000,3000],
    'rate':[0,0.5,1.5,2,3, 1.4,2.4],
    'value':[50,450,200,250,1100,500,2500],
    'answer': [0,0.0,1.5,1.5,3.0,0,1.4],
    'reason': ['50<100', '450<500','200 has rate 1.5','250<300',
              '1100>1000','500<3000','2500<3000']
})

df
    id  min  rate   value   answer  reason
0   0   100  0.00   50      0.00    for id0, value is 50, but min rate below 100 is 0.0 so answer is 0.0
1   0   500  0.50   450     0.00    for id0, value is 450, if value is less than 500 and greater than 100 rate is still 0.0 so answer is 0.0
2   1   200  1.50   200     1.50    for id1, value is 200, for which rate is 1.5
3   1   300  2.00   250     1.50    for id1, value 250 is less than 300 and so rate is 1.5
4   1   1000  3.00  1100    3.00    for id1, value 1100 > 1000 so rate is 3.0
5   2   2000 1.40   500     0.00    for id2, value 500<2000 so rate is 0.0
6   2   3000 2.40   2500    1.40    for id2, value 2000<2500<3000 so rate is 1.4

Question: how to get the answer column?

My attempt

df.groupby('id').apply(lambda dfx: dfx['min'].min())

How to find the column "answer" in above dataframe. I am thinking of pandas.cut and am not been able to get the logic correct.

CodePudding user response:

IIUC, you can use pandas.cut per group:

cut = lambda g: pd.cut(g['value'], bins=list(g['min']) [float('inf')],
                       labels=g['rate'], right=False).astype(float).fillna(0)

df['answer'] = df.groupby('id', group_keys=False).apply(cut)

output (new column as answer2 for clarity):

   id   min  rate  value  answer            reason  answer2
0   0   100   0.0     50     0.0            50<100      0.0
1   0   500   0.5    450     0.0           450<500      0.0
2   1   200   1.5    200     1.5  200 has rate 1.5      1.5
3   1   300   2.0    250     1.5           250<300      1.5
4   1  1000   3.0   1100     3.0         1100>1000      3.0
5   2  2000   1.4    500     0.0          500<3000      0.0
6   2  3000   2.4   2500     1.4         2500<3000      1.4
  • Related