Home > OS >  Conversion of ISO datetime string to pl.Datetime failure
Conversion of ISO datetime string to pl.Datetime failure

Time:01-27

I have a dataframe containing a timeseries, one column being ISO 8601 datetime strings of the form 2020-12-27T23:59:59 01:00. This is a long running timeseries spanning multiple timezone offset changes due to DST (for reference, the data can be found here).

I try to parse those into pl.Datetime via pl.col("date").str.strptime(pl.Datetime, fmt="% ")

This used to work but since version 0.15.7 of polars, this throws the following error:

exceptions.ComputeError: Different timezones found during 'strptime' operation.

I also tried a an explicit format string fmt="%Y-%m-%dT%H:%M:%S%:z" which yields the same error.

Not sure if this is a bug or user error. I read the release notes for 0.15.7 on github and there are some mentions on ISo 8601 parsing, but but nothing that hints at why this would no longer work.

CodePudding user response:

This is due to https://github.com/pola-rs/polars/pull/6434/files

Previously, the timezone was ignored when parsing with '% '. As of 0.15.17, it is respected.

In pandas, you could get around this by doing:

In [22]: pd.to_datetime(dfp['date'], utc=True).dt.tz_convert('Europe/Vienna')
Out[22]:
0        2020-12-27 23:59:59 01:00
1        2020-12-27 23:59:59 01:00
2        2020-12-27 23:59:59 01:00
3        2020-12-27 23:59:59 01:00
4        2020-12-27 23:59:59 01:00
                    ...
255355   2023-01-25 23:59:59 01:00
255356   2023-01-25 23:59:59 01:00
255357   2023-01-25 23:59:59 01:00
255358   2023-01-25 23:59:59 01:00
255359   2023-01-25 23:59:59 01:00
Name: date, Length: 255360, dtype: datetime64[ns, Europe/Vienna]

Until polars has a utc parameter, you can probably do:

(
    df["date"]
    .str.split(" ")
    .arr.get(0)
    .str.strptime(pl.Datetime)
    .dt.with_time_zone("UTC")
    .dt.cast_time_zone("Europe/Vienna")
)

which gives

Out[38]:
shape: (255360,)
Series: 'date' [datetime[μs, Europe/Vienna]]
[
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        2020-12-27 23:59:59 CET
        ...
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
        2023-01-25 23:59:59 CET
]
  • Related