Home > Blockchain >  Converting datetime timezone aware column to string with UTC time offset
Converting datetime timezone aware column to string with UTC time offset

Time:01-30

I have the following dataframe:

df = (
    pl.DataFrame(
        {
            "int": [1, 2, 3],
            "date": ["2010-01-31T23:00:00 00:00","2010-02-01T00:00:00 00:00","2010-02-01T01:00:00 00:00"]
        }
    )
    .with_columns(
        pl.col("date").str.strptime(pl.Datetime, fmt="%Y-%m-%dT%H:%M:%S%z")
        .dt.with_time_zone("Europe/Amsterdam")
    )
)

which gives:

┌─────┬────────────────────────────────┐
│ int ┆ date                           │
│ --- ┆ ---                            │
│ i64 ┆ datetime[μs, Europe/Amsterdam] │
╞═════╪════════════════════════════════╡
│ 1   ┆ 2010-02-01 00:00:00 CET        │
│ 2   ┆ 2010-02-01 01:00:00 CET        │
│ 3   ┆ 2010-02-01 02:00:00 CET        │
└─────┴────────────────────────────────┘

I would like to convert this datetime type to a string with a time zone designator, e.g. 2010-02-01 00:00:00 01:00

I tried the following:

df.with_columns(pl.col("date").dt.strftime(fmt="%Y-%m-%d %H:%M:%S%z"))

which gives the following error:

pyo3_runtime.PanicException: a formatting trait implementation returned an error: Error

My desired output is stated below, which is what you get when you convert a datetime column to a string type in pandas with the "%Y-%m-%d %H:%M:%S%z" as the format:

┌─────┬──────────────────────────┐
│ int ┆ date                     │
│ --- ┆ ---                      │
│ i64 ┆ str                      │
╞═════╪══════════════════════════╡
│ 1   ┆ 2010-02-01 00:00:00 0100 │
│ 2   ┆ 2010-02-01 01:00:00 0100 │
│ 3   ┆ 2010-02-01 02:00:00 0100 │
└─────┴──────────────────────────┘

Is there any way to realize this result? Leaving out the %z at the end when specifying the format works but the UTC time offset is something I need.

CodePudding user response:

Unfortunately, this looks like a bug.

As a temporary workaround, you can resort to using apply (which is really slow):

(
    df
    .with_columns(
        pl.col('date').apply(lambda d: d.strftime("%F %T%z")).alias('strftime')
    )
)
shape: (3, 3)
┌─────┬────────────────────────────────┬──────────────────────────┐
│ int ┆ date                           ┆ strftime                 │
│ --- ┆ ---                            ┆ ---                      │
│ i64 ┆ datetime[μs, Europe/Amsterdam] ┆ str                      │
╞═════╪════════════════════════════════╪══════════════════════════╡
│ 1   ┆ 2010-02-01 00:00:00 CET        ┆ 2010-02-01 00:00:00 0100 │
│ 2   ┆ 2010-02-01 01:00:00 CET        ┆ 2010-02-01 01:00:00 0100 │
│ 3   ┆ 2010-02-01 02:00:00 CET        ┆ 2010-02-01 02:00:00 0100 │
└─────┴────────────────────────────────┴──────────────────────────┘

CodePudding user response:

(On mobile so bear with me)

I don't think this is a bug. I think what's happening is that when you strptime with the offset, that doesn't uniquely define a time zone even though it's showing a timezone (perhaps that's your local time zone). I think all you need to do assuming you have at least version .16 is add utc=True to your strptime.

  • Related