Home > Software design >  How to add a duration to datetime in Python polars
How to add a duration to datetime in Python polars

Time:12-15

I want to add a duration in seconds to a date/time. My data looks like

import polars as pl

df = pl.DataFrame(
    {
        "dt": [
            "2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
        ],
        "seconds": [
            1.0, 2.2, 2.4,
        ],
    }
)

df = df.with_column(pl.col("dt").str.strptime(pl.Datetime).cast(pl.Datetime))

Now my naive attempt was to to convert the float column to duration type to be able to add it to the datetime column (as I would do in pandas).

df = df.with_column(pl.col("seconds").cast(pl.Duration).alias("duration0"))

print(df.head())

┌─────────────────────┬─────────┬──────────────┐
│ dt                  ┆ seconds ┆ duration0    │
│ ---                 ┆ ---     ┆ ---          │
│ datetime[μs]        ┆ f64     ┆ duration[μs] │
╞═════════════════════╪═════════╪══════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0     ┆ 0µs          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2     ┆ 0µs          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4     ┆ 0µs          │
└─────────────────────┴─────────┴──────────────┘

...gives the correct data type, however the values are all zero.

I also tried

df = df.with_column(
    pl.col("seconds")
    .apply(lambda x: pl.duration(nanoseconds=x * 1e9))
    .alias("duration1")
)
print(df.head())
shape: (3, 4)
┌─────────────────────┬─────────┬──────────────┬─────────────────────────────────────┐
│ dt                  ┆ seconds ┆ duration0    ┆ duration1                           │
│ ---                 ┆ ---     ┆ ---          ┆ ---                                 │
│ datetime[μs]        ┆ f64     ┆ duration[μs] ┆ object                              │
╞═════════════════════╪═════════╪══════════════╪═════════════════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0     ┆ 0µs          ┆ 0i64.duration([0i64, 1000000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2     ┆ 0µs          ┆ 0i64.duration([0i64, 2200000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4     ┆ 0µs          ┆ 0i64.duration([0i64, 2400000000f... │
└─────────────────────┴─────────┴──────────────┴─────────────────────────────────────┘

which gives an object type column which isn't helpful either. The documentation is kind of sparse on the topic, any better options?

CodePudding user response:

It seems it may be an issue with just the repr of the duration values.

>>> df.select(pl.col("seconds").cast(pl.Duration))
shape: (3, 1)
┌──────────────┐
│ seconds      │
│ ---          │
│ duration[μs] │
╞══════════════╡
│ 0µs          │
├──────────────┤
│ 0µs          │
├──────────────┤
│ 0µs          │
└─//───────────┘
>>> df.select(pl.col("seconds").cast(pl.Duration).dt.microseconds())
shape: (3, 1)
┌─────────┐
│ seconds │
│ ---     │
│ i64     │
╞═════════╡
│ 1       │
├─────────┤
│ 2       │
├─────────┤
│ 2       │
└─//──────┘

They do add as expected in your example

>>> df = pl.DataFrame({
...    "dt": [
...       "2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
...    ],
...    "seconds": [
...       1.0, 2.2, 2.4,
...    ],
... })
>>> df.select(
...    pl.col("dt").str.strptime(pl.Datetime)
...         pl.col("seconds").cast(pl.Duration)
... )
shape: (3, 1)
┌────────────────────────────┐
│ dt                         │
│ ---                        │
│ datetime[μs]               │
╞════════════════════════════╡
│ 2022-12-14 00:00:00.000001 │
├────────────────────────────┤
│ 2022-12-14 00:00:00.000002 │
├────────────────────────────┤
│ 2022-12-14 00:00:00.000002 │
└─//─────────────────────────┘

CodePudding user response:

there's another option as well; since datetime is represented internally as microseconds here, you can directly add the seconds as microseconds:

MICROSECONDS_PER_SECOND = 1e6
df = df.with_column((df["dt"] df["seconds"]*MICROSECONDS_PER_SECOND)
                    .cast(pl.Datetime)
                    .alias("dt_new"))

print(df.head())
shape: (3, 3)
┌─────────────────────┬─────────┬─────────────────────────┐
│ dt                  ┆ seconds ┆ dt_new                  │
│ ---                 ┆ ---     ┆ ---                     │
│ datetime[μs]        ┆ f64     ┆ datetime[μs]            │
╞═════════════════════╪═════════╪═════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0     ┆ 2022-12-14 00:00:01     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2     ┆ 2022-12-14 00:00:02.200 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4     ┆ 2022-12-14 00:00:02.400 │
└─────────────────────┴─────────┴─────────────────────────┘

  • Related