Is there a possibility to find the index after a groupby. so in other words i use a groupby function to find max values, but i would like to find the corresponding index in the original data set and add this to a seperate column.
data= pd.DataFrame(
[[pd.Timestamp('2022-08-05 10:11:04'), 140, 120],
[pd.Timestamp('2022-08-05 10:11:05'), 400, 155],
[pd.Timestamp('2022-08-05 10:13:06'), 400, 160],
[pd.Timestamp('2022-08-05 10:15:07'), 100, 155],
[pd.Timestamp('2022-08-05 10:15:08'), 430, 160],
[pd.Timestamp('2022-09-05 10:17:09'), 430, 130],
[pd.Timestamp('2022-09-07 10:17:10'), 430, 131],
[pd.Timestamp('2022-09-07 10:17:11'), 430, 170],
[pd.Timestamp('2022-09-07 10:18:06'), 430, 156],
[pd.Timestamp('2022-09-07 10:19:07'), 130, 155],
[pd.Timestamp('2022-09-07 10:19:08'), 130, 160],
[pd.Timestamp('2022-09-07 10:19:09'), 430, 130],
[pd.Timestamp('2022-09-07 10:20:10'), 430, 131],
[pd.Timestamp('2022-09-07 10:20:11'), 130, 170]],
columns=['timestamp', 'power', 'heart rate'])
so i want to know the max heart rate per date, with the corresponding index in the data dataframe
CodePudding user response:
Use GroupBy.transform
with DataFrameGroupBy.idxmax
for indices by maximal value per group in column heart rate
:
data['new'] = data.groupby(data['timestamp'].dt.date)['heart rate'].transform('idxmax')
print (data)
timestamp power heart rate new
0 2022-08-05 10:11:04 140 120 2
1 2022-08-05 10:11:05 400 155 2
2 2022-08-05 10:13:06 400 160 2
3 2022-08-05 10:15:07 100 155 2
4 2022-08-05 10:15:08 430 160 2
5 2022-09-05 10:17:09 430 130 5
6 2022-09-07 10:17:10 430 131 7
7 2022-09-07 10:17:11 430 170 7
8 2022-09-07 10:18:06 430 156 7
9 2022-09-07 10:19:07 130 155 7
10 2022-09-07 10:19:08 130 160 7
11 2022-09-07 10:19:09 430 130 7
12 2022-09-07 10:20:10 430 131 7
13 2022-09-07 10:20:11 130 170 7
If need only maximal rows per groups:
df = data.loc[data.groupby(data['timestamp'].dt.date)['heart rate'].idxmax()]
print (df)
timestamp power heart rate
2 2022-08-05 10:13:06 400 160
5 2022-09-05 10:17:09 430 130
7 2022-09-07 10:17:11 430 170
CodePudding user response:
try:
data.groupby(data.timestamp.dt.date).agg({'heart rate':(max, pd.Series.idxmax)})
Output:
heart rate
max idxmax
timestamp
2022-08-05 160 2
2022-09-05 130 5
2022-09-07 170 7