Home > front end >  How to convert & localize naive datetimes (using daylight saving time) to aware datetimes
How to convert & localize naive datetimes (using daylight saving time) to aware datetimes

Time:09-24

I have a dataframe including columns Date & Time, which are local clock readings (as strings). and the column dst, which indicates whether Daylight Saving Time is active or not using W for winter and Sfor summer.

I know the timezone is Europe/Berlin, resulting in a 1 hour shift from UTC during wintertime and a 2 hour shift during summertime.

I am very unhappy with this representation and want to convert to aware datetime objects in UTC and only give human readable times when needed.

      Date      Time  dst
27.03.2022  01:15:00    W
27.03.2022  01:30:00    W
27.03.2022  01:45:00    W
27.03.2022  03:00:00    S
27.03.2022  03:15:00    S
27.03.2022  03:30:00    S
27.03.2022  03:45:00    S
27.03.2022  04:00:00    S
27.03.2022  04:15:00    S
27.03.2022  04:30:00    S
27.03.2022  04:45:00    S
27.03.2022  05:00:00    S
27.03.2022  05:15:00    S

My first approach was, retrieving a datetime object using pandas, localize it and subtracting two or one hours depending on the given dst, using numpy.

from datetime import datetime, timedelta, timezone
from dateutil import tz

import numpy as np
import pandas as pd

df['datetime'] = pd.to_datetime(df['Date']   df['Time'], format='%d.%m.%Y%H:%M:%S')
df['datetime_aware'] = df['datetime'].dt.tz_localize(tz='Europe/Berlin')
df['datetime_aware_subtracted'] = np.where(df['dst']=='S', df['datetime_aware']-timedelta(hours=2),
                                           df['datetime_aware']-timedelta(hours=1))

This yields almost correct results, except between 03:00 - 05:00 (in datetime) the datetime_aware_subtracted column yields wrong result. One hour too much subtracted one hour timeshift too little. I feel like substracting time over the dst boundary is not a good idea.

        datetime               datetime_aware   datetime_aware_subtracted
27.03.2022 01:15    2022-03-27 01:15:00 01:00   2022-03-27 00:15:00 01:00
27.03.2022 01:30    2022-03-27 01:30:00 01:00   2022-03-27 00:30:00 01:00
27.03.2022 01:45    2022-03-27 01:45:00 01:00   2022-03-27 00:45:00 01:00
27.03.2022 03:00    2022-03-27 03:00:00 02:00   2022-03-27 00:00:00 01:00
27.03.2022 03:15    2022-03-27 03:15:00 02:00   2022-03-27 00:15:00 01:00
27.03.2022 03:30    2022-03-27 03:30:00 02:00   2022-03-27 00:30:00 01:00
27.03.2022 03:45    2022-03-27 03:45:00 02:00   2022-03-27 00:45:00 01:00
27.03.2022 04:00    2022-03-27 04:00:00 02:00   2022-03-27 01:00:00 01:00
27.03.2022 04:15    2022-03-27 04:15:00 02:00   2022-03-27 01:15:00 01:00
27.03.2022 04:30    2022-03-27 04:30:00 02:00   2022-03-27 01:30:00 01:00
27.03.2022 04:45    2022-03-27 04:45:00 02:00   2022-03-27 01:45:00 01:00
27.03.2022 05:00    2022-03-27 05:00:00 02:00   2022-03-27 03:00:00 02:00
27.03.2022 05:15    2022-03-27 05:15:00 02:00   2022-03-27 03:15:00 02:00

My second approach, was to inverse subtraction and localizing.

df['datetime'] = pd.to_datetime(df['Date']   df['Time'], format='%d.%m.%Y%H:%M:%S')
df['datetime_subtracted'] = np.where(df['dst']=='S', df['datetime']-timedelta(hours=2),
                                     df['datetime']-timedelta(hours=1))
df['datetime_subtracted_aware'] = df['datetime_subtracted'].dt.tz_localize(tz='Europe/Berlin')

This gives the correct naive results but after subtracting gives an NonExistentTimeError when localizing (rightfully so).

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\envs\env\lib\site-packages\pandas\core\accessor.py", line 94, in f
    return self._delegate_method(name, *args, **kwargs)
  File "C:\ProgramData\Miniconda3\envs\env\lib\site-packages\pandas\core\indexes\accessors.py", line 123, in _delegate_method
    result = method(*args, **kwargs)
  File "C:\ProgramData\Miniconda3\envs\env\lib\site-packages\pandas\core\indexes\datetimes.py", line 273, in tz_localize
    arr = self._data.tz_localize(tz, ambiguous, nonexistent)
  File "C:\ProgramData\Miniconda3\envs\env\lib\site-packages\pandas\core\arrays\_mixins.py", line 84, in method
    return meth(self, *args, **kwargs)
  File "C:\ProgramData\Miniconda3\envs\env\lib\site-packages\pandas\core\arrays\datetimes.py", line 1043, in tz_localize
    new_dates = tzconversion.tz_localize_to_utc(
  File "pandas\_libs\tslibs\tzconversion.pyx", line 328, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc
pytz.exceptions.NonExistentTimeError: 2022-03-27 02:00:00

What are my best options to aware UTC datetime objects in Europe/Berlin timezone? In other words, create a new column datetime_aware_localized that shows the same date&time as the datetime column but is an aware datetime object in correct timezone.

CodePudding user response:

You can tell tz_localize how to handle non-existing times:

nonexistentstr, default ‘raise’

A nonexistent time does not exist in a particular timezone where clocks moved forward due to DST. Valid values are:

‘shift_forward’ will shift the nonexistent time forward to the closest existing time

‘shift_backward’ will shift the nonexistent time backward to the closest existing time

‘NaT’ will return NaT where there are nonexistent times

timedelta objects will shift nonexistent times by the timedelta

‘raise’ will raise an NonExistentTimeError if there are nonexistent times.

Replacing with NaT:

dt = pd.to_datetime(df['Date'] ' ' df['Time'])
corr = pd.to_timedelta(df['dst'].map({'W': 1, 'S': 2}), unit='H')

# converting to NaT
dt.sub(corr).dt.tz_localize(tz='Europe/Berlin', nonexistent='NaT')

output:

0    2022-03-27 00:15:00 01:00
1    2022-03-27 00:30:00 01:00
2    2022-03-27 00:45:00 01:00
3    2022-03-27 01:00:00 01:00
4    2022-03-27 01:15:00 01:00
5    2022-03-27 01:30:00 01:00
6    2022-03-27 01:45:00 01:00
7                          NaT
8                          NaT
9                          NaT
10                         NaT
11   2022-03-27 03:00:00 02:00
12   2022-03-27 03:15:00 02:00
dtype: datetime64[ns, Europe/Berlin]

shifting forward:

dt.sub(corr).dt.tz_localize(tz='Europe/Berlin', nonexistent='shift_forward')

0    2022-03-27 00:15:00 01:00
1    2022-03-27 00:30:00 01:00
2    2022-03-27 00:45:00 01:00
3    2022-03-27 01:00:00 01:00
4    2022-03-27 01:15:00 01:00
5    2022-03-27 01:30:00 01:00
6    2022-03-27 01:45:00 01:00
7    2022-03-27 03:00:00 02:00
8    2022-03-27 03:00:00 02:00
9    2022-03-27 03:00:00 02:00
10   2022-03-27 03:00:00 02:00
11   2022-03-27 03:00:00 02:00
12   2022-03-27 03:15:00 02:00
dtype: datetime64[ns, Europe/Berlin]

CodePudding user response:

to clarify, localization works perfectly fine like

pd.to_datetime(df['Date']   df['Time'], format='%d.%m.%Y%H:%M:%S').dt.tz_localize("Europe/Berlin")

0    2022-03-27 01:15:00 01:00
1    2022-03-27 01:30:00 01:00
2    2022-03-27 01:45:00 01:00
3    2022-03-27 03:00:00 02:00
4    2022-03-27 03:15:00 02:00
5    2022-03-27 03:30:00 02:00
6    2022-03-27 03:45:00 02:00
7    2022-03-27 04:00:00 02:00
8    2022-03-27 04:15:00 02:00
9    2022-03-27 04:30:00 02:00
10   2022-03-27 04:45:00 02:00
11   2022-03-27 05:00:00 02:00
12   2022-03-27 05:15:00 02:00
dtype: datetime64[ns, Europe/Berlin]

If you need UTC, use

pd.to_datetime(df['Date']   df['Time'], format='%d.%m.%Y%H:%M:%S').dt.tz_localize("Europe/Berlin").dt.tz_convert("UTC")

0    2022-03-27 00:15:00 00:00
1    2022-03-27 00:30:00 00:00
2    2022-03-27 00:45:00 00:00
3    2022-03-27 01:00:00 00:00
4    2022-03-27 01:15:00 00:00
5    2022-03-27 01:30:00 00:00
6    2022-03-27 01:45:00 00:00
7    2022-03-27 02:00:00 00:00
8    2022-03-27 02:15:00 00:00
9    2022-03-27 02:30:00 00:00
10   2022-03-27 02:45:00 00:00
11   2022-03-27 03:00:00 00:00
12   2022-03-27 03:15:00 00:00
dtype: datetime64[ns, UTC]

regarding NonExistentTimeError error, you can find an example here, where this is actually a problem and keyword nonexistent is necessary.

  • Related