Home > Enterprise >  Find the missing month in given date range then add that missing date in the data with same records
Find the missing month in given date range then add that missing date in the data with same records

Time:11-07

I have a Statement of accounts, where i have Unique ID, Disbursed date, payment date and the balance amount.

Date range for below data = Disbursed date to May-2022

Example of date:

Unique   Disbursed date      payment date   balance amount
123      2022-Jan-13          2022-Jan-27      10,000 
123      2022-Jan-13          2022-Feb-28       5,000 
123      2022-Jan-13          2022-Apr-29       2,000 
  1. first I want to groupby payment date(last day of each month) and as an aggr function instead of Sum or mean, I want to carry forward the same balance reflecting in the last month last day.

  2. As you can see March is missing in the records, here I want to add a new record for March with same balance given in Feb-22 i.e 5,000 and date for the new record should be last day of Mar-22.

  3. Since date range given till 2022-May then here I want to add another new record for May-22 with same balance given in last month (Apr-22) i.e 2000 and date for the new record should be last day of May-22

Note : I have multiple unique ids like 123, 456, 789, etc.

I'd tried below code to find out the missing month

for i in df['date']: pd.date_range(i,'2020-11-28').difference(df.index) print(i)

but, it is giving days wise missing date. I want to find out the missing "month" instead of date for each unique id

CodePudding user response:

You can use:

# generate needed month ends
idx = pd.date_range('2022-01', '2022-06', freq='M')

out = (df
   # compute the month end for existing data
   .assign(month_end=pd.to_datetime(df['payment date'])
                       .sub(pd.Timedelta('1d'))
                       .add(pd.offsets.MonthEnd()))
   .set_index(['Unique', 'month_end'])
   # reindex with missing ID/month ends
   .reindex(pd.MultiIndex.from_product([df['Unique'].unique(),
                                        idx
                                       ], names=['Unique', 'idx']))
   .reset_index()
   # fill missing month end with correct format
   .assign(**{'payment date': lambda d:
              d['payment date'].fillna(d['idx'].dt.strftime('%Y-%b-%d'))})
   # ffill the data per ID
   .groupby('Unique').ffill()
   
)

output:

   Unique        idx Disbursed date payment date balance amount
0     123 2022-01-31    2022-Jan-13  2022-Jan-27         10,000
1     123 2022-02-28    2022-Jan-13  2022-Feb-28          5,000
2     123 2022-03-31    2022-Jan-13  2022-Mar-31          5,000
3     123 2022-04-30    2022-Jan-13  2022-Apr-29          2,000
4     123 2022-05-31    2022-Jan-13  2022-May-31          2,000
  • Related