Get time/index of high and low of dataframe aggegated by time frequency-CodePudding

I have the following dataframe dataframe spanning multiple days with 1 min OHLCV values. The dataframe would be quite big, 20 years of data. (1 year would be about 117,000 rows I estimate..).

                       open    high     low   close   volume  
date
2022-09-08 15:30:00  314.69  314.83  313.83  314.11   764.54   
2022-09-08 15:31:00  314.10  314.22  313.86  314.08   184.55    
2022-09-08 15:32:00  314.20  314.37  313.98  314.14   170.66    
2022-09-09 21:55:00  322.54  322.54  322.39  322.50   184.44    
2022-09-09 21:56:00  322.48  322.48  322.31  322.38   296.67    
2022-09-09 21:57:00  322.38  322.40  322.30  322.33   253.45

Per day I want to return the date and time of the highest value of the column 'high' and the lowest value of the column 'low'.

So the result would be in this case.

             high                   low    
date
2022-09-08   2022-09-08 15:30:00    2022-09-08 15:30:00
2022-09-09   2022-09-09 21:55:00    2022-09-09 21:57:00

I would be very grateful if someone can help me with how to do this in an efficient way.

Thanks a lot!
Ivo

CodePudding user response：

I'll be interested to see what one-liner someone comes back with. But here's a working solution.

Basically, using the idxmax function to get row with the maximum value in the 'high' column and the idxmin function to get the row with the minimum value in the 'low' column. Then merging those two frames together at the end.

df = pd.DataFrame([
    ['2022-09-08 15:30:00', 314.69,  314.83,  313.83,  314.11,   764.54],
    ['2022-09-08 15:31:00', 314.10,  314.22,  313.86,  314.08,   184.55],    
    ['2022-09-08 15:32:00', 314.20,  314.37,  313.98,  314.14,   170.66],
    ['2022-09-09 21:55:00', 322.54,  322.54,  322.39,  322.50,   184.44],    
    ['2022-09-09 21:56:00', 322.48,  322.48,  322.31,  322.38,   296.67],    
    ['2022-09-09 21:57:00', 322.38,  322.40,  322.30,  322.33,   253.45]  
], columns=['datetime', 'open', 'high', 'low', 'close', 'volume']

)

df['date'] = pd.to_datetime(df['datetime']).dt.date

df_max = df[['date','datetime']].loc[df.groupby('date')['high'].idxmax()]
df_max.rename(columns = {'datetime':'max_datetime'}, inplace = True)

df_min = df[['date','datetime']].loc[df.groupby('date')['low'].idxmin()]
df_min.rename(columns = {'datetime':'min_datetime'}, inplace = True)

xx = pd.merge(df_max, df_min, how='left', on='date')

date	max_datetime	min_datetime
2022-09-08	2022-09-08 15:30:00	2022-09-08 15:30:00
2022-09-09	2022-09-09 21:55:00	2022-09-09 21:57:00

CodePudding user response：

Use groupby.agg with idxmin/idxmax on a datetime index:

# ensure datetime
df.index = pd.to_datetime(df.index)

out = (df
   .groupby(df.index.date)
   .agg({'high': 'idxmax',
         'low': 'idxmin'})
   .rename_axis('date')
 )

Output:

                          high                 low
date                                              
2022-09-08 2022-09-08 15:30:00 2022-09-08 15:30:00
2022-09-09 2022-09-09 21:55:00 2022-09-09 21:57:00