I have the following dataframe dataframe spanning multiple days with 1 min OHLCV values. The dataframe would be quite big, 20 years of data. (1 year would be about 117,000 rows I estimate..).
open high low close volume
date
2022-09-08 15:30:00 314.69 314.83 313.83 314.11 764.54
2022-09-08 15:31:00 314.10 314.22 313.86 314.08 184.55
2022-09-08 15:32:00 314.20 314.37 313.98 314.14 170.66
2022-09-09 21:55:00 322.54 322.54 322.39 322.50 184.44
2022-09-09 21:56:00 322.48 322.48 322.31 322.38 296.67
2022-09-09 21:57:00 322.38 322.40 322.30 322.33 253.45
Per day I want to return the date and time of the highest value of the column 'high' and the lowest value of the column 'low'.
So the result would be in this case.
high low
date
2022-09-08 2022-09-08 15:30:00 2022-09-08 15:30:00
2022-09-09 2022-09-09 21:55:00 2022-09-09 21:57:00
I would be very grateful if someone can help me with how to do this in an efficient way.
Thanks a lot!
Ivo
CodePudding user response:
I'll be interested to see what one-liner someone comes back with. But here's a working solution.
Basically, using the idxmax
function to get row with the maximum value in the 'high' column and the idxmin
function to get the row with the minimum value in the 'low' column. Then merging those two frames together at the end.
df = pd.DataFrame([
['2022-09-08 15:30:00', 314.69, 314.83, 313.83, 314.11, 764.54],
['2022-09-08 15:31:00', 314.10, 314.22, 313.86, 314.08, 184.55],
['2022-09-08 15:32:00', 314.20, 314.37, 313.98, 314.14, 170.66],
['2022-09-09 21:55:00', 322.54, 322.54, 322.39, 322.50, 184.44],
['2022-09-09 21:56:00', 322.48, 322.48, 322.31, 322.38, 296.67],
['2022-09-09 21:57:00', 322.38, 322.40, 322.30, 322.33, 253.45]
], columns=['datetime', 'open', 'high', 'low', 'close', 'volume']
)
df['date'] = pd.to_datetime(df['datetime']).dt.date
df_max = df[['date','datetime']].loc[df.groupby('date')['high'].idxmax()]
df_max.rename(columns = {'datetime':'max_datetime'}, inplace = True)
df_min = df[['date','datetime']].loc[df.groupby('date')['low'].idxmin()]
df_min.rename(columns = {'datetime':'min_datetime'}, inplace = True)
xx = pd.merge(df_max, df_min, how='left', on='date')
date | max_datetime | min_datetime |
---|---|---|
2022-09-08 | 2022-09-08 15:30:00 | 2022-09-08 15:30:00 |
2022-09-09 | 2022-09-09 21:55:00 | 2022-09-09 21:57:00 |
CodePudding user response:
Use groupby.agg
with idxmin
/idxmax
on a datetime index:
# ensure datetime
df.index = pd.to_datetime(df.index)
out = (df
.groupby(df.index.date)
.agg({'high': 'idxmax',
'low': 'idxmin'})
.rename_axis('date')
)
Output:
high low
date
2022-09-08 2022-09-08 15:30:00 2022-09-08 15:30:00
2022-09-09 2022-09-09 21:55:00 2022-09-09 21:57:00