Home > Software design >  Add timedelta to a date column above weeks
Add timedelta to a date column above weeks

Time:06-25

How would I add 1 year to a column?

I've tried using map and apply but I failed miserably.

I also wonder why pl.date() accepts integers while it advertises that it only accepts str or pli.Expr.

A small hack workaround is:

col = pl.col('date').dt
df = df.with_column(pl.when(pl.col(column).is_not_null())
                    .then(pl.date(col.year()   1, col.month(), col.day()))
                    .otherwise(pl.date(col.year()   1,col.month(), col.day()))
                    .alias("date"))

but this won't work for months or days. I can't just add a number or I'll get a:

> thread 'thread '<unnamed>' panicked at 'invalid or out-of-range date<unnamed>',
         ' panicked at '/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rsinvalid or out-of-range date:', 173:/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs51
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Most likely because day and month cycle while year goes to infinity.

I could also do this:

df = df.with_column(
        pl.when(col.month() == 1)
        .then(pl.date(col.year(), 2, col.day()))
        .when(col.month() == 2)
        .then(pl.date(col.year(), 3, col.day()))
        .when(col.month() == 3)
        .then(pl.date(col.year(), 4, col.day()))
        .when(col.month() == 4)
        .then(pl.date(col.year(), 5, col.day()))
        .when(col.month() == 5)
        .then(pl.date(col.year(), 6, col.day()))
        .when(col.month() == 6)
        .then(pl.date(col.year(), 7, col.day()))
        .when(col.month() == 7)
        .then(pl.date(col.year(), 8, col.day()))
        .when(col.month() == 8)
        .then(pl.date(col.year(), 9, col.day()))
        .when(col.month() == 9)
        .then(pl.date(col.year(), 10, col.day()))
        .when(col.month() == 10)
        .then(pl.date(col.year(), 11, col.day()))
        .when(col.month() == 11)
        .then(pl.date(col.year(), 12, col.day()))
        .otherwise(pl.date(col.year()   1, 1, 1))
        .alias("valid_from")
    )

CodePudding user response:

Polars allows to do addition and subtraction with python's timedelta objects.

We do still have to correct for leap years. We can use when, then, otherwise for that.

import polars as pl
from datetime import timedelta

def is_leap_year(name: str) -> bool:
    year = pl.col(name).dt.year()
    return (year % 4 == 0) & (year % 100 != 0) | (year % 400 == 0)

(pl.DataFrame({
    "dates": pl.date_range(datetime(2000, 1, 1), datetime(2026, 1, 1), "1y")
}).with_columns([
    is_leap_year("dates").alias("leap_year"),
    pl.when(is_leap_year("dates").alias("leap_year"))
    .then(pl.col("dates")   timedelta(days=366))
    .otherwise(pl.col("dates")   timedelta(days=365)).alias("dates_and_1_year")
]))
shape: (27, 3)
┌─────────────────────┬───────────┬─────────────────────┐
│ dates               ┆ leap_year ┆ dates_and_1_year    │
│ ---                 ┆ ---       ┆ ---                 │
│ datetime[ns]        ┆ bool      ┆ datetime[ns]        │
╞═════════════════════╪═══════════╪═════════════════════╡
│ 2000-01-01 00:00:00 ┆ true      ┆ 2001-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2001-01-01 00:00:00 ┆ false     ┆ 2002-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2002-01-01 00:00:00 ┆ false     ┆ 2003-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2003-01-01 00:00:00 ┆ false     ┆ 2004-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...                 ┆ ...       ┆ ...                 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2023-01-01 00:00:00 ┆ false     ┆ 2024-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2024-01-01 00:00:00 ┆ true      ┆ 2025-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2025-01-01 00:00:00 ┆ false     ┆ 2026-01-01 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2026-01-01 00:00:00 ┆ false     ┆ 2027-01-01 00:00:00 │
└─────────────────────┴───────────┴─────────────────────┘

CodePudding user response:

You can use polars.apply and dateutil.relativedelta which works for years, months, days and much more, but can be slow for lots of data.

from datetime import date
from dateutil.relativedelta import relativedelta

df = pl.DataFrame(pl.date_range(date(2019, 1, 1), date(2020, 10, 1), '3mo', name='date'))
df.with_column(pl.col('date').apply(lambda x: x   relativedelta(years=1)))
  • Related