I have a dataset like this
Business_dt Value_dt ID
0 05/01/2021 01/01/2021 1
1 06/01/2021 01/01/2021 1
2 07/01/2021 01/01/2021 1
3 08/01/2021 01/01/2021 1
4 15/02/2021 13/02/2021 2
5 16/02/2021 13/02/2021 2
6 17/02/2021 13/02/2021 2
7 10/08/2021 10/08/2021 3
8 11/08/2021 10/08/2021 3
I want to extrapolate the data so that when the value date is smaller than the business date, I copy the observations and change the date so that I have an observation for each business date until business date equals value date. What I want to have is this:
Business_dt Value_dt ID
0 01/01/2021 01/01/2021 1
1 02/01/2021 01/01/2021 1
2 03/01/2021 01/01/2021 1
3 04/01/2021 01/01/2021 1
4 05/01/2021 01/01/2021 1
5 06/01/2021 01/01/2021 1
6 07/01/2021 01/01/2021 1
7 08/01/2021 01/01/2021 1
8 13/02/2021 13/02/2021 2
9 14/02/2021 13/02/2021 2
10 15/02/2021 13/02/2021 2
11 16/02/2021 13/02/2021 2
12 17/02/2021 13/02/2021 2
13 10/08/2021 10/08/2021 3
14 11/08/2021 10/08/2021 3
So far I selected the observations where the business_date is smaller than the value_dt and tried the following code in Python. At the end I was thinking about appending these dates to the original dataframe. However, I do not manage to extrapolate the data. How would you do it?:
df.date_range = pd.date_range(df.valdt, df.business_date - timedelta(days = df.report_diff), freq='d')
where report_diff is the difference between both dates in days.
CodePudding user response:
You can use a custom function to reindex the dates:
def add_missing(d):
MIN, MAX = pd.to_datetime(d['Business_dt'], dayfirst=True).agg(['min', 'max'])
idx = (pd.date_range(min(MIN, pd.to_datetime(d['Value_dt'].iloc[0])),
MAX, freq='D')
.strftime('%d/%m/%Y')
)
return (d.set_index('Business_dt')
.reindex(idx, method='bfill')
.reset_index()
)
out = df.groupby('ID', group_keys=False).apply(add_missing)
print(out)
Output:
index Value_dt ID
0 01/01/2021 01/01/2021 1
1 02/01/2021 01/01/2021 1
2 03/01/2021 01/01/2021 1
3 04/01/2021 01/01/2021 1
4 05/01/2021 01/01/2021 1
5 06/01/2021 01/01/2021 1
6 07/01/2021 01/01/2021 1
7 08/01/2021 01/01/2021 1
0 13/02/2021 13/02/2021 2
1 14/02/2021 13/02/2021 2
2 15/02/2021 13/02/2021 2
3 16/02/2021 13/02/2021 2
4 17/02/2021 13/02/2021 2
0 10/08/2021 10/08/2021 3
1 11/08/2021 10/08/2021 3