Home > Software engineering >  Getting '1970-01-01...' when using tz_localize
Getting '1970-01-01...' when using tz_localize

Time:10-25

I have a dataframe like so:

                        timestamp    open    high      low    close  volume  trade_count        vwap symbol
0       2021-10-28 13:30:00 00:00    8.32    8.37    8.310    8.355   16981           26    8.322444    BVN
1       2021-10-28 13:31:00 00:00    8.35    8.35    8.325    8.335     676           11    8.340991    BVN
2       2021-10-28 13:32:00 00:00    8.32    8.33    8.290    8.310    4458           22    8.306497    BVN
3       2021-10-28 13:33:00 00:00    8.34    8.40    8.330    8.400   13025           59    8.348660    BVN
4       2021-10-28 13:34:00 00:00    8.38    8.40    8.320    8.400    5405           29    8.353166    BVN
...                           ...     ...     ...      ...      ...     ...          ...         ...    ...
395837  2022-10-10 20:22:00 00:00  129.32  129.32  129.320  129.320    9630            3  129.319618    WMT
395838  2022-10-10 20:33:00 00:00  129.75  129.75  129.750  129.750     215            4  129.736047    WMT
395839  2022-10-10 20:46:00 00:00  129.13  129.13  129.130  129.130     100            1  129.130000    WMT
395840  2022-10-10 22:16:00 00:00  129.13  129.13  129.130  129.130     100            1  129.130000    WMT
395841  2022-10-10 23:41:00 00:00  129.93  129.93  129.930  129.930     200            1  129.930000    WMT

[395842 rows x 9 columns]

I make the timestamp column a datetime obj:

df['timestamp'] = pd.to_datetime(df['timestamp'])

And then group the minute bars by individual days:

seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])

But when I loop over the data and print the index:

for name, ohlc in seperate_days:
    ohlc.set_index('timestamp')
    ohlc.index = pd.to_datetime(ohlc.index)
    ohlc.index = ohlc.index.tz_localize(tz='America/New_York')
    print(ohlc.index)

All I'm getting is dates from the 1970s:

DatetimeIndex(['1970-01-01 00:00:00.000171684-05:00',
               '1970-01-01 00:00:00.000171685-05:00',
               '1970-01-01 00:00:00.000171686-05:00',
               '1970-01-01 00:00:00.000171687-05:00',
               '1970-01-01 00:00:00.000171688-05:00',
               '1970-01-01 00:00:00.000171689-05:00',
               '1970-01-01 00:00:00.000171690-05:00',
               '1970-01-01 00:00:00.000171691-05:00',
               '1970-01-01 00:00:00.000171692-05:00',
               '1970-01-01 00:00:00.000171693-05:00',
               ...
               '1970-01-01 00:00:00.000172203-05:00',
               '1970-01-01 00:00:00.000172204-05:00',
               '1970-01-01 00:00:00.000172205-05:00',
               '1970-01-01 00:00:00.000172206-05:00',
               '1970-01-01 00:00:00.000172207-05:00',
               '1970-01-01 00:00:00.000172208-05:00',
               '1970-01-01 00:00:00.000172209-05:00',
               '1970-01-01 00:00:00.000172210-05:00',
               '1970-01-01 00:00:00.000172211-05:00',
               '1970-01-01 00:00:00.000172212-05:00'],
              dtype='datetime64[ns, America/New_York]', length=529, freq=None)

How can I get the dates to work correctly?

CodePudding user response:

in your example code, ohlc.index is the dataframe index (ie: 0,1,2,…) so you are getting pd.to_datetime() of those values.

CodePudding user response:

Here's how I did it:

df = pd.read_sql_query("SELECT * from ohlc_minutes", conn)


df['timestamp'] = pd.to_datetime(df['timestamp']) # re

startTime = datetime.datetime.now()

# group the minute bars by individual days
seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])


for name, ohlc in seperate_days:
    ohlc.index = ohlc['timestamp']

    open_hours_indices = ohlc.index.indexer_between_time('04:00', '20:00')
    open_hours = ohlc.index[open_hours_indices]
    
    display(open_hours)

Giving:

DatetimeIndex(['2022-10-07 10:18:00 00:00', '2022-10-07 11:34:00 00:00',
               '2022-10-07 12:21:00 00:00', '2022-10-07 12:22:00 00:00',
               '2022-10-07 12:30:00 00:00', '2022-10-07 12:35:00 00:00',
               '2022-10-07 12:47:00 00:00', '2022-10-07 12:48:00 00:00',
               '2022-10-07 12:49:00 00:00', '2022-10-07 12:50:00 00:00',
               ...
               '2022-10-07 19:51:00 00:00', '2022-10-07 19:52:00 00:00',
               '2022-10-07 19:53:00 00:00', '2022-10-07 19:54:00 00:00',
               '2022-10-07 19:55:00 00:00', '2022-10-07 19:56:00 00:00',
               '2022-10-07 19:57:00 00:00', '2022-10-07 19:58:00 00:00',
               '2022-10-07 19:59:00 00:00', '2022-10-07 20:00:00 00:00'],
              dtype='datetime64[ns, UTC]', name='timestamp', length=410, freq=None)
DatetimeIndex(['2022-10-10 11:00:00 00:00', '2022-10-10 11:43:00 00:00',
               '2022-10-10 11:52:00 00:00', '2022-10-10 12:14:00 00:00',
               '2022-10-10 13:20:00 00:00', '2022-10-10 13:23:00 00:00',
               '2022-10-10 13:25:00 00:00', '2022-10-10 13:28:00 00:00',
               '2022-10-10 13:30:00 00:00', '2022-10-10 13:31:00 00:00',
               ...
               '2022-10-10 19:51:00 00:00', '2022-10-10 19:52:00 00:00',
               '2022-10-10 19:53:00 00:00', '2022-10-10 19:54:00 00:00',
               '2022-10-10 19:55:00 00:00', '2022-10-10 19:56:00 00:00',
               '2022-10-10 19:57:00 00:00', '2022-10-10 19:58:00 00:00',
               '2022-10-10 19:59:00 00:00', '2022-10-10 20:00:00 00:00'],
              dtype='datetime64[ns, UTC]', name='timestamp', length=399, freq=None)
  • Related