Updating Year in One Column Based on Month of Another Column that accounts for New Year-CodePudding

I have created this DataFrame:

  agency coupon vintage Cbal       Month       CPR          year  month Month_Predicted_DT
0   FHLG    1.5 2021    70.090310   November    5.418937    2022    11  2022-11-01
1   FHLG    1.5 2021    70.090310   December    5.549916    2022    12  2022-12-01
2   FHLG    1.5 2021    70.090310   January     5.238943    2022    1   2022-01-01
3   FHLG    1.5 2020    52.414637   November    5.514456    2022    11  2022-11-01
4   FHLG    1.5 2020    52.414637   December    5.550490    2022    12  2022-12-01
5   FHLG    1.5 2020    52.414637   January 5.182304        2022    1   2022-01-01

Created from this original df:

  agency  coupon year   Cbal        November    December    January
0   FHLG    1.5 2021    70.090310   5.418937    5.549916    5.238943
1   FHLG    1.5 2020    52.414637   5.514456    5.550490    5.182304
2   FHLG    2.0 2022    44.598755   3.346706    3.715995    3.902644
3   FHLG    2.0 2021    472.209165  5.802857    5.899596    5.627774
4   FHLG    2.0 2020    269.761452  7.090993    7.091404    6.567561

Using this code:

citi = pd.read_excel("Downloads/CITI_2022_05_22(5_22).xlsx")

#Extracting just the relevant months (M, M 1, M 2)
M = citi.columns[-6]
M_1 = citi.columns[-4]
M_2 = citi.columns[-2]

#Extracting just the relevant columns
cols = ['agency-term','coupon','year','Cbal',M,M_1,M_2]
citi = citi[cols]
todays_date = date.today()
current_year = todays_date.year
citi_new['year'] = current_year
citi_new['month'] = pd.to_datetime(citi_new.Month, format="%B").dt.month
citi_new['Month_Predicted_DT'] = pd.to_datetime(citi_new[['year', 'month']].assign(DAY=1))
citi_new = citi.set_index(cols[0:4]).stack().reset_index()
citi_new.rename(columns={"level_4": "Month", 0 : "CPR", "year" : "vintage"}, inplace = True)

For reference M is the current month, and M_1 and M_2 are month 1 and month 2.

My main question is that my solution for creating the 'Month_Predicted_DT column only works if the months in question do not overlap with the new year, so if M == November or M == December, then the year in Month_Predicted_DT is not correct for January and/or February. For example, Month_Predicted_DT for January rows should be 2023-01-01 not 2022. The same would be true if M was December, then I would want rows for Jan. and Feb. to be 2023-01-01 and 2023-02-01, respectively.

I have tried to come up with a workaround using df.iterrows or np.where but just can't really get a working solution.

CodePudding user response：

You could try adding 12 months to dates that are over two months out:

#get first day of the current month
start = pd.Timestamp.today().normalize().replace(day=1)

#convert month column to timestamps
dates = pd.to_datetime(df["Month"] f"{start.year}", format="%B%Y")

#offset the year if the date is not in the next 3 months
df["Month_Predicted_DT"] = dates.where(dates>=start,dates pd.DateOffset(months=12))