I am trying to find the maximum value of precip based on the station column and the hour of the valid column. I have no workable code as I've been trying this forever and have found no even close to working solution.
Here is the dataframe I am working with
station | valid | precip |
---|---|---|
1V4 | 2022-02-23 07:54:00 | 0.2 |
1V4 | 2022-02-23 07:55:00 | 0.2 |
1V4 | 2022-02-23 07:58:00 | 0.4 |
1V4 | 2022-02-23 08:07:00 | 0.1 |
1V4 | 2022-02-23 08:51:00 | 0.6 |
BTV | 2022-02-23 07:52:00 | 0.1 |
BTV | 2022-02-23 07:54:00 | 0.2 |
BTV | 2022-02-23 07:59:00 | 0.3 |
BTV | 2022-02-23 08:02:00 | 0.0 |
BTV | 2022-02-23 08:16:00 | 0.0 |
BTV | 2022-02-23 08:29:00 | 0.3 |
This is what I want it to look like
station | valid | precip |
---|---|---|
1V4 | 2022-02-23 07:58:00 | 0.4 |
1V4 | 2022-02-23 08:51:00 | 0.6 |
BTV | 2022-02-23 07:59:00 | 0.3 |
BTV | 2022-02-23 08:29:00 | 0.3 |
CodePudding user response:
You need to group by station and hour and get the idxmax
, then slice:
df['valid'] = pd.to_datetime(df['valid'])
df.loc[df.groupby(['station', df['valid'].dt.hour])['precip'].idxmax()]
output:
station valid precip
2 1V4 2022-02-23 07:58:00 0.4
4 1V4 2022-02-23 08:51:00 0.6
7 BTV 2022-02-23 07:59:00 0.3
10 BTV 2022-02-23 08:29:00 0.3
CodePudding user response:
This will allow you to group by the hour and the station
df1['Hour'] = pd.to_datetime(df['valid'], infer_datetime_format=True).dt.hour
df1['precip_max'] = df1.groupby(['station', 'Hour'])['precip'].transform('max')
df1.loc[df['precip'] == df['precip_max']]