I have created this DataFrame:
agency coupon vintage Cbal Month CPR year month Month_Predicted_DT
0 FHLG 1.5 2021 70.090310 November 5.418937 2022 11 2022-11-01
1 FHLG 1.5 2021 70.090310 December 5.549916 2022 12 2022-12-01
2 FHLG 1.5 2021 70.090310 January 5.238943 2022 1 2022-01-01
3 FHLG 1.5 2020 52.414637 November 5.514456 2022 11 2022-11-01
4 FHLG 1.5 2020 52.414637 December 5.550490 2022 12 2022-12-01
5 FHLG 1.5 2020 52.414637 January 5.182304 2022 1 2022-01-01
Created from this original df:
agency coupon year Cbal November December January
0 FHLG 1.5 2021 70.090310 5.418937 5.549916 5.238943
1 FHLG 1.5 2020 52.414637 5.514456 5.550490 5.182304
2 FHLG 2.0 2022 44.598755 3.346706 3.715995 3.902644
3 FHLG 2.0 2021 472.209165 5.802857 5.899596 5.627774
4 FHLG 2.0 2020 269.761452 7.090993 7.091404 6.567561
Using this code:
citi = pd.read_excel("Downloads/CITI_2022_05_22(5_22).xlsx")
#Extracting just the relevant months (M, M 1, M 2)
M = citi.columns[-6]
M_1 = citi.columns[-4]
M_2 = citi.columns[-2]
#Extracting just the relevant columns
cols = ['agency-term','coupon','year','Cbal',M,M_1,M_2]
citi = citi[cols]
todays_date = date.today()
current_year = todays_date.year
citi_new['year'] = current_year
citi_new['month'] = pd.to_datetime(citi_new.Month, format="%B").dt.month
citi_new['Month_Predicted_DT'] = pd.to_datetime(citi_new[['year', 'month']].assign(DAY=1))
citi_new = citi.set_index(cols[0:4]).stack().reset_index()
citi_new.rename(columns={"level_4": "Month", 0 : "CPR", "year" : "vintage"}, inplace = True)
For reference M
is the current month, and M_1
and M_2
are month 1 and month 2.
My main question is that my solution for creating the 'Month_Predicted_DT
column only works if the months in question do not overlap with the new year, so if M == November
or M == December
, then the year in Month_Predicted_DT
is not correct for January and/or February. For example, Month_Predicted_DT
for January rows should be 2023-01-01
not 2022
. The same would be true if M
was December, then I would want rows for Jan. and Feb. to be 2023-01-01
and 2023-02-01
, respectively.
I have tried to come up with a workaround using df.iterrows
or np.where
but just can't really get a working solution.
CodePudding user response:
You could try adding 12 months to dates that are over two months out:
#get first day of the current month
start = pd.Timestamp.today().normalize().replace(day=1)
#convert month column to timestamps
dates = pd.to_datetime(df["Month"] f"{start.year}", format="%B%Y")
#offset the year if the date is not in the next 3 months
df["Month_Predicted_DT"] = dates.where(dates>=start,dates pd.DateOffset(months=12))