List of dates between two date variables in pandas dataframe-CodePudding

I have a dataset like this

   Business_dt   Value_dt    ID
0  05/01/2021    01/01/2021  1
1  06/01/2021    01/01/2021  1
2  07/01/2021    01/01/2021  1
3  08/01/2021    01/01/2021  1
4  15/02/2021    13/02/2021  2
5  16/02/2021    13/02/2021  2
6  17/02/2021    13/02/2021  2
7  10/08/2021    10/08/2021  3
8  11/08/2021    10/08/2021  3

I want to extrapolate the data so that when the value date is smaller than the business date, I copy the observations and change the date so that I have an observation for each business date until business date equals value date. What I want to have is this:

   Business_dt   Value_dt    ID
0  01/01/2021    01/01/2021  1
1  02/01/2021    01/01/2021  1
2  03/01/2021    01/01/2021  1
3  04/01/2021    01/01/2021  1
4  05/01/2021    01/01/2021  1
5  06/01/2021    01/01/2021  1
6  07/01/2021    01/01/2021  1
7  08/01/2021    01/01/2021  1
8  13/02/2021    13/02/2021  2
9  14/02/2021    13/02/2021  2
10 15/02/2021    13/02/2021  2
11 16/02/2021    13/02/2021  2
12 17/02/2021    13/02/2021  2
13 10/08/2021    10/08/2021  3
14 11/08/2021    10/08/2021  3

So far I selected the observations where the business_date is smaller than the value_dt and tried the following code in Python. At the end I was thinking about appending these dates to the original dataframe. However, I do not manage to extrapolate the data. How would you do it?:

df.date_range = pd.date_range(df.valdt, df.business_date - timedelta(days = df.report_diff), freq='d')

where report_diff is the difference between both dates in days.

CodePudding user response：

You can use a custom function to reindex the dates:

def add_missing(d):
    MIN, MAX = pd.to_datetime(d['Business_dt'], dayfirst=True).agg(['min', 'max'])
    idx = (pd.date_range(min(MIN, pd.to_datetime(d['Value_dt'].iloc[0])),
                         MAX, freq='D')
             .strftime('%d/%m/%Y')
          )
    return (d.set_index('Business_dt')
             .reindex(idx, method='bfill')
             .reset_index()
           )


out = df.groupby('ID', group_keys=False).apply(add_missing)

print(out)

Output:

        index    Value_dt  ID
0  01/01/2021  01/01/2021   1
1  02/01/2021  01/01/2021   1
2  03/01/2021  01/01/2021   1
3  04/01/2021  01/01/2021   1
4  05/01/2021  01/01/2021   1
5  06/01/2021  01/01/2021   1
6  07/01/2021  01/01/2021   1
7  08/01/2021  01/01/2021   1
0  13/02/2021  13/02/2021   2
1  14/02/2021  13/02/2021   2
2  15/02/2021  13/02/2021   2
3  16/02/2021  13/02/2021   2
4  17/02/2021  13/02/2021   2
0  10/08/2021  10/08/2021   3
1  11/08/2021  10/08/2021   3