Home > Mobile >  Find index of max() after groupby
Find index of max() after groupby

Time:08-11

Is there a possibility to find the index after a groupby. so in other words i use a groupby function to find max values, but i would like to find the corresponding index in the original data set and add this to a seperate column.

data= pd.DataFrame(
    [[pd.Timestamp('2022-08-05 10:11:04'), 140, 120],
    [pd.Timestamp('2022-08-05 10:11:05'), 400, 155],
    [pd.Timestamp('2022-08-05 10:13:06'), 400, 160],
    [pd.Timestamp('2022-08-05 10:15:07'), 100, 155],
    [pd.Timestamp('2022-08-05 10:15:08'), 430, 160],
    [pd.Timestamp('2022-09-05 10:17:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:17:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:17:11'), 430, 170],
    [pd.Timestamp('2022-09-07 10:18:06'), 430, 156],
    [pd.Timestamp('2022-09-07 10:19:07'), 130, 155],
    [pd.Timestamp('2022-09-07 10:19:08'), 130, 160],
    [pd.Timestamp('2022-09-07 10:19:09'), 430, 130],
    [pd.Timestamp('2022-09-07 10:20:10'), 430, 131],
    [pd.Timestamp('2022-09-07 10:20:11'), 130, 170]],
    columns=['timestamp', 'power', 'heart rate'])

so i want to know the max heart rate per date, with the corresponding index in the data dataframe

CodePudding user response:

Use GroupBy.transform with DataFrameGroupBy.idxmax for indices by maximal value per group in column heart rate:

data['new'] = data.groupby(data['timestamp'].dt.date)['heart rate'].transform('idxmax')
print (data)
             timestamp  power  heart rate  new
0  2022-08-05 10:11:04    140         120    2
1  2022-08-05 10:11:05    400         155    2
2  2022-08-05 10:13:06    400         160    2
3  2022-08-05 10:15:07    100         155    2
4  2022-08-05 10:15:08    430         160    2
5  2022-09-05 10:17:09    430         130    5
6  2022-09-07 10:17:10    430         131    7
7  2022-09-07 10:17:11    430         170    7
8  2022-09-07 10:18:06    430         156    7
9  2022-09-07 10:19:07    130         155    7
10 2022-09-07 10:19:08    130         160    7
11 2022-09-07 10:19:09    430         130    7
12 2022-09-07 10:20:10    430         131    7
13 2022-09-07 10:20:11    130         170    7

If need only maximal rows per groups:

df = data.loc[data.groupby(data['timestamp'].dt.date)['heart rate'].idxmax()]
print (df)
            timestamp  power  heart rate
2 2022-08-05 10:13:06    400         160
5 2022-09-05 10:17:09    430         130
7 2022-09-07 10:17:11    430         170

CodePudding user response:

try:

data.groupby(data.timestamp.dt.date).agg({'heart rate':(max, pd.Series.idxmax)})

Output:

           heart rate       
                  max idxmax
timestamp                   
2022-08-05        160      2
2022-09-05        130      5
2022-09-07        170      7
  • Related