I have a dataframe like so:
timestamp open high low close volume trade_count vwap symbol
0 2021-10-28 13:30:00 00:00 8.32 8.37 8.310 8.355 16981 26 8.322444 BVN
1 2021-10-28 13:31:00 00:00 8.35 8.35 8.325 8.335 676 11 8.340991 BVN
2 2021-10-28 13:32:00 00:00 8.32 8.33 8.290 8.310 4458 22 8.306497 BVN
3 2021-10-28 13:33:00 00:00 8.34 8.40 8.330 8.400 13025 59 8.348660 BVN
4 2021-10-28 13:34:00 00:00 8.38 8.40 8.320 8.400 5405 29 8.353166 BVN
... ... ... ... ... ... ... ... ... ...
395837 2022-10-10 20:22:00 00:00 129.32 129.32 129.320 129.320 9630 3 129.319618 WMT
395838 2022-10-10 20:33:00 00:00 129.75 129.75 129.750 129.750 215 4 129.736047 WMT
395839 2022-10-10 20:46:00 00:00 129.13 129.13 129.130 129.130 100 1 129.130000 WMT
395840 2022-10-10 22:16:00 00:00 129.13 129.13 129.130 129.130 100 1 129.130000 WMT
395841 2022-10-10 23:41:00 00:00 129.93 129.93 129.930 129.930 200 1 129.930000 WMT
[395842 rows x 9 columns]
I make the timestamp column a datetime obj:
df['timestamp'] = pd.to_datetime(df['timestamp'])
And then group the minute bars by individual days:
seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])
But when I loop over the data and print the index:
for name, ohlc in seperate_days:
ohlc.set_index('timestamp')
ohlc.index = pd.to_datetime(ohlc.index)
ohlc.index = ohlc.index.tz_localize(tz='America/New_York')
print(ohlc.index)
All I'm getting is dates from the 1970s:
DatetimeIndex(['1970-01-01 00:00:00.000171684-05:00',
'1970-01-01 00:00:00.000171685-05:00',
'1970-01-01 00:00:00.000171686-05:00',
'1970-01-01 00:00:00.000171687-05:00',
'1970-01-01 00:00:00.000171688-05:00',
'1970-01-01 00:00:00.000171689-05:00',
'1970-01-01 00:00:00.000171690-05:00',
'1970-01-01 00:00:00.000171691-05:00',
'1970-01-01 00:00:00.000171692-05:00',
'1970-01-01 00:00:00.000171693-05:00',
...
'1970-01-01 00:00:00.000172203-05:00',
'1970-01-01 00:00:00.000172204-05:00',
'1970-01-01 00:00:00.000172205-05:00',
'1970-01-01 00:00:00.000172206-05:00',
'1970-01-01 00:00:00.000172207-05:00',
'1970-01-01 00:00:00.000172208-05:00',
'1970-01-01 00:00:00.000172209-05:00',
'1970-01-01 00:00:00.000172210-05:00',
'1970-01-01 00:00:00.000172211-05:00',
'1970-01-01 00:00:00.000172212-05:00'],
dtype='datetime64[ns, America/New_York]', length=529, freq=None)
How can I get the dates to work correctly?
CodePudding user response:
in your example code, ohlc.index is the dataframe index (ie: 0,1,2,…) so you are getting pd.to_datetime() of those values.
CodePudding user response:
Here's how I did it:
df = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
df['timestamp'] = pd.to_datetime(df['timestamp']) # re
startTime = datetime.datetime.now()
# group the minute bars by individual days
seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])
for name, ohlc in seperate_days:
ohlc.index = ohlc['timestamp']
open_hours_indices = ohlc.index.indexer_between_time('04:00', '20:00')
open_hours = ohlc.index[open_hours_indices]
display(open_hours)
Giving:
DatetimeIndex(['2022-10-07 10:18:00 00:00', '2022-10-07 11:34:00 00:00',
'2022-10-07 12:21:00 00:00', '2022-10-07 12:22:00 00:00',
'2022-10-07 12:30:00 00:00', '2022-10-07 12:35:00 00:00',
'2022-10-07 12:47:00 00:00', '2022-10-07 12:48:00 00:00',
'2022-10-07 12:49:00 00:00', '2022-10-07 12:50:00 00:00',
...
'2022-10-07 19:51:00 00:00', '2022-10-07 19:52:00 00:00',
'2022-10-07 19:53:00 00:00', '2022-10-07 19:54:00 00:00',
'2022-10-07 19:55:00 00:00', '2022-10-07 19:56:00 00:00',
'2022-10-07 19:57:00 00:00', '2022-10-07 19:58:00 00:00',
'2022-10-07 19:59:00 00:00', '2022-10-07 20:00:00 00:00'],
dtype='datetime64[ns, UTC]', name='timestamp', length=410, freq=None)
DatetimeIndex(['2022-10-10 11:00:00 00:00', '2022-10-10 11:43:00 00:00',
'2022-10-10 11:52:00 00:00', '2022-10-10 12:14:00 00:00',
'2022-10-10 13:20:00 00:00', '2022-10-10 13:23:00 00:00',
'2022-10-10 13:25:00 00:00', '2022-10-10 13:28:00 00:00',
'2022-10-10 13:30:00 00:00', '2022-10-10 13:31:00 00:00',
...
'2022-10-10 19:51:00 00:00', '2022-10-10 19:52:00 00:00',
'2022-10-10 19:53:00 00:00', '2022-10-10 19:54:00 00:00',
'2022-10-10 19:55:00 00:00', '2022-10-10 19:56:00 00:00',
'2022-10-10 19:57:00 00:00', '2022-10-10 19:58:00 00:00',
'2022-10-10 19:59:00 00:00', '2022-10-10 20:00:00 00:00'],
dtype='datetime64[ns, UTC]', name='timestamp', length=399, freq=None)