Home > OS >  Lag of values from 1 month ago
Lag of values from 1 month ago

Time:04-02

My initial dataset has only 2 columns, date and value.

What I'm trying to do is, for each date, get the value from the previous month (columns m-1 and m-12). The problems I'm having is when the day doesn't exist in previous month, like 29 of February, that I want to leave it empty, and most methods tend to get the nearby dates.

So, the final table would be something like this:

date value m-1 m-12
2021-01-05 400 NaN NaN
2022-01-05 100 NaN 400
2022-01-28 300 NaN NaN
2022-02-05 300 100 NaN
2022-02-28 500 300 NaN
2022-03-29 300 NaN NaN

I was thinking I could use something like d.apply(lambda x: x['date'] - relativedelta(months = 1), axis=1) but with this, I only get the date, not the value. And it rounds the dates, for example for 2022-03-29 it returns 2022-02-28 which is not correct, it should be 02-29, and since it doesn't exist it should be NaN.

CodePudding user response:

Here is a possiblity :

# 1. necessary imports
import pandas as pd
from dateutil.relativedelta import relativedelta
import numpy as np

# 2. build example
df_example = pd.DataFrame(columns=["date", "value"])
df_example.date = ["2021-01-05", "2022-01-05", "2022-01-28", "2022-02-05", "2022-02-28", "2022-03-29"]
df_example.date = pd.to_datetime(df_example.date)
df_example.value = [400, 100, 300, 300, 500, 300]

# 3. look for the value corresponding to a given date
def build_column(row, month_shift, year_shift):
    previous_month_date = f"{row.date.year-year_shift}-{row.date.month-month_shift}-{row.date.day}"
    previous_value_row = df_example[df_example.date == previous_month_date]
    if len(previous_value_row) == 0:
        return np.nan

    return previous_value_row.value.iloc[0]

# 4. add column m1 corresponding to the 1-month shift
df_example["m1"] = df_example.apply(lambda x: build_column(x, 1, 0), axis=1)
df_example["m12"] = df_example.apply(lambda x: build_column(x, 0, 1), axis=1)

Which outputs

date value m1 m12
0 2021-01-05 00:00:00 400 nan nan
1 2022-01-05 00:00:00 100 nan 400
2 2022-01-28 00:00:00 300 nan nan
3 2022-02-05 00:00:00 300 100 nan
4 2022-02-28 00:00:00 500 300 nan
5 2022-03-29 00:00:00 300 nan nan
  • Related