how to loop through ohlc minute data by day?-CodePudding

I have a df containing minute bars of different symbols like so:

                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-13 08:00:00 00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-13 08:01:00 00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-13 08:02:00 00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-13 08:03:00 00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-13 08:04:00 00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL
...                          ...     ...     ...      ...     ...     ...          ...         ...    ...
58250  2021-10-27 19:58:00 00:00  209.31  209.33  209.215  209.26   26440          348  209.251852    ZTS
58251  2021-10-27 19:59:00 00:00  209.28  209.59  209.010  209.56  109758         1060  209.384672    ZTS
58252  2021-10-27 20:03:00 00:00  209.58  209.58  209.580  209.58  537786           49  209.580000    ZTS
58253  2021-10-27 20:05:00 00:00  209.58  209.58  209.580  209.58    4170            1  209.580000    ZTS
58254  2021-10-27 20:12:00 00:00  209.58  209.58  209.580  209.58     144            1  209.580000    ZTS

[58255 rows x 9 columns]

I want to be able to use df.groupby so I can loop over each of the days of each ticker. Something like:

                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-13 08:00:00 00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-13 08:01:00 00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-13 08:02:00 00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-13 08:03:00 00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-13 08:04:00 00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL



                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-14 08:00:00 00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-14 08:01:00 00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-14 08:02:00 00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-14 08:03:00 00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-14 08:04:00 00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL

How can I do this?

Someone suggested I look at another question:

table = df.groupby(pd.Grouper(key='timestamp', axis=0, freq='D')).sum()

But this takes the minute data and returns daily:

Name: 2022-04-04 00:00:00 00:00, dtype: float64)
(Timestamp('2022-04-05 00:00:00 0000', tz='UTC', freq='D'), open           0.0
high           0.0
low            0.0
close          0.0
volume         0.0
trade_count    0.0
vwap           0.0
Name: 2022-04-05 00:00:00 00:00, dtype: float64)
(Timestamp('2022-04-06 00:00:00 0000', tz='UTC', freq='D'), open            2000.818300
high            2001.724000
low             2000.563300
close           2001.462900
volume         59717.000000
trade_count      487.000000
vwap            2001.073115
Name: 2022-04-06 00:00:00 00:00, dtype: float64)

I need to take my minute data and split the minutes into separate days. I don't need to upscale to daily bars like was suggested here.

CodePudding user response：

https://pandas.pydata.org/docs/user_guide/basics.html#by-values

I think the decision depends on the actual aim of looping over days plus symbols.

df = df.sort_values(by=["timestamp", "symbol"])

CodePudding user response：

Are you looking for:

# ensure datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# group by symbol and date
out = df.groupby(['symbol', df['timestamp'].dt.date]).sum()