I have a dataframe containing a timeseries, one column being ISO 8601 datetime strings of the form 2020-12-27T23:59:59 01:00
. This is a long running timeseries spanning multiple timezone offset changes due to DST (for reference, the data can be found here).
I try to parse those into pl.Datetime
via pl.col("date").str.strptime(pl.Datetime, fmt="% ")
This used to work but since version 0.15.7 of polars, this throws the following error:
exceptions.ComputeError: Different timezones found during 'strptime' operation.
I also tried a an explicit format string fmt="%Y-%m-%dT%H:%M:%S%:z"
which yields the same error.
Not sure if this is a bug or user error. I read the release notes for 0.15.7 on github and there are some mentions on ISo 8601 parsing, but but nothing that hints at why this would no longer work.
CodePudding user response:
This is due to https://github.com/pola-rs/polars/pull/6434/files
Previously, the timezone was ignored when parsing with '% '
. As of 0.15.17, it is respected.
In pandas, you could get around this by doing:
In [22]: pd.to_datetime(dfp['date'], utc=True).dt.tz_convert('Europe/Vienna')
Out[22]:
0 2020-12-27 23:59:59 01:00
1 2020-12-27 23:59:59 01:00
2 2020-12-27 23:59:59 01:00
3 2020-12-27 23:59:59 01:00
4 2020-12-27 23:59:59 01:00
...
255355 2023-01-25 23:59:59 01:00
255356 2023-01-25 23:59:59 01:00
255357 2023-01-25 23:59:59 01:00
255358 2023-01-25 23:59:59 01:00
255359 2023-01-25 23:59:59 01:00
Name: date, Length: 255360, dtype: datetime64[ns, Europe/Vienna]
Until polars has a utc
parameter, you can probably do:
(
df["date"]
.str.split(" ")
.arr.get(0)
.str.strptime(pl.Datetime)
.dt.with_time_zone("UTC")
.dt.cast_time_zone("Europe/Vienna")
)
which gives
Out[38]:
shape: (255360,)
Series: 'date' [datetime[μs, Europe/Vienna]]
[
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
2020-12-27 23:59:59 CET
...
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
2023-01-25 23:59:59 CET
]