I'm trying to convert string timestamps to polars datetime from the timestamps my camera puts in it RAW file metadata, but polars throws this error when I have timestamps from both summer time and winter time.
ComputeError: Different timezones found during 'strptime' operation.
How do I persuade it to convert these successfully? (ideally handling different timezones as well as the change from summer to winter time)
And then how do I convert these timestamps back to the proper local clocktime for display?
Note that while the timestamp strings just show the offset, there is an exif field "Time Zone City" in the metadata as well as fields with just the local (naive) timestamp
import polars as plr
testdata=[
{'name': 'BST 11:06', 'ts': '2022:06:27 11:06:12.16 01:00'},
{'name': 'GMT 7:06', 'ts': '2022:12:27 12:06:12.16 00:00'},
]
pdf = plr.DataFrame(testdata)
pdfts = pdf.with_column(plr.col('ts').str.strptime(plr.Datetime, fmt = "%Y:%m:%d %H:%M:%S.%f%z"))
print(pdf)
print(pdfts)
It looks like I need to use tz_convert, but I cannot see how to add it to the conversion expression and what looks like the relevant docpage just 404's broken link to dt_namespace
CodePudding user response:
The implementation of parsing UTC offsets seems to be incomplete as of python polars 0.15.7. Here's a work-around you could use: remove the UTC offset and localize to a pre-defined time zone.
Note: the result will only be correct if UTC offsets and time zone agree.
timezone = "Europe/London"
pdfts = pdf.with_column(
plr.col('ts')
.str.replace("[ |-][0-9]{2}:[0-9]{2}", "")
.str.strptime(plr.Datetime, fmt="%Y:%m:%d %H:%M:%S%.f")
.dt.tz_localize(timezone)
)
print(pdf)
┌───────────┬──────────────────────────────┐
│ name ┆ ts │
│ --- ┆ --- │
│ str ┆ str │
╞═══════════╪══════════════════════════════╡
│ BST 11:06 ┆ 2022:06:27 11:06:12.16 01:00 │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ GMT 7:06 ┆ 2022:12:27 12:06:12.16 00:00 │
└───────────┴──────────────────────────────┘
print(pdfts)
┌───────────┬─────────────────────────────┐
│ name ┆ ts │
│ --- ┆ --- │
│ str ┆ datetime[ns, Europe/London] │
╞═══════════╪═════════════════════════════╡
│ BST 11:06 ┆ 2022-06-27 11:06:12.160 BST │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ GMT 7:06 ┆ 2022-12-27 12:06:12.160 GMT │
└───────────┴─────────────────────────────┘
Side-Note: to be fair, pandas
does not handle mixed UTC offsets either, unless you parse to UTC straight away (keyword utc=True
in pd.to_datetime
). With mixed UTC offsets, it falls back to using series of native Python datetime objects. That makes a lot of the pandas time series functionality like the dt
accessor unavailable.
CodePudding user response:
Similar to FObersteiner's solution but this will manually parse the offset rather than having to assume your camera's offset matches a predefined timezone definition correctly.
First step is to use extract
regex to separate the offset from the rest of the time. The offset is split into the hours and minutes inclusive of the sign. Then we just strptime
the datetime component from the first step as a naive time, add/subtract the offset, localize it to UTC, and then make it the desired timezone (in this case Europe/London). **(I load polars
as pl not plr so adjust as necessary)
(pdf
.with_columns(
[pl.col('ts').str.extract("(\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}\.\d{2})"),
pl.col('ts').str.extract("\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}\.\d{2}((\ |-)\d{2}):\d{2}")
.cast(pl.Float64()).alias("offset"),
pl.col('ts').str.extract("\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}\.\d{2}(\ |-)\d{2}:(\d{2})", group_index=2)
.cast(pl.Float64()).alias("offset_minute")])
.select(
['name',
(pl.col('ts').str.strptime(pl.Datetime(), "%Y:%m:%d %H:%M:%S%.f") - pl.duration(hours=pl.col('offset'), minutes=pl.col('offset_minute')))
.dt.tz_localize('UTC').dt.with_time_zone('Europe/London')]))
shape: (2, 3)
┌───────────┬────────┬─────────────────────────────┐
│ name ┆ offset ┆ dt │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ datetime[ns, Europe/London] │
╞═══════════╪════════╪═════════════════════════════╡
│ BST 11:06 ┆ 1.0 ┆ 2022-06-27 11:06:12.160 BST │
│ GMT 7:06 ┆ 0.0 ┆ 2022-12-27 12:06:12.160 GMT │
└───────────┴────────┴─────────────────────────────┘