My initial dataset has only 2 columns, date and value.
What I'm trying to do is, for each date, get the value from the previous month (columns m-1 and m-12). The problems I'm having is when the day doesn't exist in previous month, like 29 of February, that I want to leave it empty, and most methods tend to get the nearby dates.
So, the final table would be something like this:
date | value | m-1 | m-12 |
---|---|---|---|
2021-01-05 | 400 | NaN | NaN |
2022-01-05 | 100 | NaN | 400 |
2022-01-28 | 300 | NaN | NaN |
2022-02-05 | 300 | 100 | NaN |
2022-02-28 | 500 | 300 | NaN |
2022-03-29 | 300 | NaN | NaN |
I was thinking I could use something like d.apply(lambda x: x['date'] - relativedelta(months = 1), axis=1)
but with this, I only get the date, not the value. And it rounds the dates, for example for 2022-03-29 it returns 2022-02-28 which is not correct, it should be 02-29, and since it doesn't exist it should be NaN.
CodePudding user response:
Here is a possiblity :
# 1. necessary imports
import pandas as pd
from dateutil.relativedelta import relativedelta
import numpy as np
# 2. build example
df_example = pd.DataFrame(columns=["date", "value"])
df_example.date = ["2021-01-05", "2022-01-05", "2022-01-28", "2022-02-05", "2022-02-28", "2022-03-29"]
df_example.date = pd.to_datetime(df_example.date)
df_example.value = [400, 100, 300, 300, 500, 300]
# 3. look for the value corresponding to a given date
def build_column(row, month_shift, year_shift):
previous_month_date = f"{row.date.year-year_shift}-{row.date.month-month_shift}-{row.date.day}"
previous_value_row = df_example[df_example.date == previous_month_date]
if len(previous_value_row) == 0:
return np.nan
return previous_value_row.value.iloc[0]
# 4. add column m1 corresponding to the 1-month shift
df_example["m1"] = df_example.apply(lambda x: build_column(x, 1, 0), axis=1)
df_example["m12"] = df_example.apply(lambda x: build_column(x, 0, 1), axis=1)
Which outputs
date | value | m1 | m12 | |
---|---|---|---|---|
0 | 2021-01-05 00:00:00 | 400 | nan | nan |
1 | 2022-01-05 00:00:00 | 100 | nan | 400 |
2 | 2022-01-28 00:00:00 | 300 | nan | nan |
3 | 2022-02-05 00:00:00 | 300 | 100 | nan |
4 | 2022-02-28 00:00:00 | 500 | 300 | nan |
5 | 2022-03-29 00:00:00 | 300 | nan | nan |