I want to add a duration in seconds to a date/time. My data looks like
import polars as pl
df = pl.DataFrame(
{
"dt": [
"2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
],
"seconds": [
1.0, 2.2, 2.4,
],
}
)
df = df.with_column(pl.col("dt").str.strptime(pl.Datetime).cast(pl.Datetime))
Now my naive attempt was to to convert the float column to duration type to be able to add it to the datetime column (as I would do in pandas
).
df = df.with_column(pl.col("seconds").cast(pl.Duration).alias("duration0"))
print(df.head())
┌─────────────────────┬─────────┬──────────────┐
│ dt ┆ seconds ┆ duration0 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] │
╞═════════════════════╪═════════╪══════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs │
└─────────────────────┴─────────┴──────────────┘
...gives the correct data type, however the values are all zero.
I also tried
df = df.with_column(
pl.col("seconds")
.apply(lambda x: pl.duration(nanoseconds=x * 1e9))
.alias("duration1")
)
print(df.head())
shape: (3, 4)
┌─────────────────────┬─────────┬──────────────┬─────────────────────────────────────┐
│ dt ┆ seconds ┆ duration0 ┆ duration1 │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ duration[μs] ┆ object │
╞═════════════════════╪═════════╪══════════════╪═════════════════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 0µs ┆ 0i64.duration([0i64, 1000000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 0µs ┆ 0i64.duration([0i64, 2200000000f... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 0µs ┆ 0i64.duration([0i64, 2400000000f... │
└─────────────────────┴─────────┴──────────────┴─────────────────────────────────────┘
which gives an object type column which isn't helpful either. The documentation is kind of sparse on the topic, any better options?
CodePudding user response:
It seems it may be an issue with just the repr
of the duration values.
>>> df.select(pl.col("seconds").cast(pl.Duration))
shape: (3, 1)
┌──────────────┐
│ seconds │
│ --- │
│ duration[μs] │
╞══════════════╡
│ 0µs │
├──────────────┤
│ 0µs │
├──────────────┤
│ 0µs │
└─//───────────┘
>>> df.select(pl.col("seconds").cast(pl.Duration).dt.microseconds())
shape: (3, 1)
┌─────────┐
│ seconds │
│ --- │
│ i64 │
╞═════════╡
│ 1 │
├─────────┤
│ 2 │
├─────────┤
│ 2 │
└─//──────┘
They do add as expected in your example
>>> df = pl.DataFrame({
... "dt": [
... "2022-12-14T00:00:00", "2022-12-14T00:00:00", "2022-12-14T00:00:00",
... ],
... "seconds": [
... 1.0, 2.2, 2.4,
... ],
... })
>>> df.select(
... pl.col("dt").str.strptime(pl.Datetime)
... pl.col("seconds").cast(pl.Duration)
... )
shape: (3, 1)
┌────────────────────────────┐
│ dt │
│ --- │
│ datetime[μs] │
╞════════════════════════════╡
│ 2022-12-14 00:00:00.000001 │
├────────────────────────────┤
│ 2022-12-14 00:00:00.000002 │
├────────────────────────────┤
│ 2022-12-14 00:00:00.000002 │
└─//─────────────────────────┘
CodePudding user response:
there's another option as well; since datetime is represented internally as microseconds here, you can directly add the seconds as microseconds:
MICROSECONDS_PER_SECOND = 1e6
df = df.with_column((df["dt"] df["seconds"]*MICROSECONDS_PER_SECOND)
.cast(pl.Datetime)
.alias("dt_new"))
print(df.head())
shape: (3, 3)
┌─────────────────────┬─────────┬─────────────────────────┐
│ dt ┆ seconds ┆ dt_new │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ datetime[μs] │
╞═════════════════════╪═════════╪═════════════════════════╡
│ 2022-12-14 00:00:00 ┆ 1.0 ┆ 2022-12-14 00:00:01 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.2 ┆ 2022-12-14 00:00:02.200 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-12-14 00:00:00 ┆ 2.4 ┆ 2022-12-14 00:00:02.400 │
└─────────────────────┴─────────┴─────────────────────────┘