Home > database >  Updating Year in One Column Based on Month of Another Column that accounts for New Year
Updating Year in One Column Based on Month of Another Column that accounts for New Year

Time:06-10

I have created this DataFrame:

  agency coupon vintage Cbal       Month       CPR          year  month Month_Predicted_DT
0   FHLG    1.5 2021    70.090310   November    5.418937    2022    11  2022-11-01
1   FHLG    1.5 2021    70.090310   December    5.549916    2022    12  2022-12-01
2   FHLG    1.5 2021    70.090310   January     5.238943    2022    1   2022-01-01
3   FHLG    1.5 2020    52.414637   November    5.514456    2022    11  2022-11-01
4   FHLG    1.5 2020    52.414637   December    5.550490    2022    12  2022-12-01
5   FHLG    1.5 2020    52.414637   January 5.182304        2022    1   2022-01-01

Created from this original df:

  agency  coupon year   Cbal        November    December    January
0   FHLG    1.5 2021    70.090310   5.418937    5.549916    5.238943
1   FHLG    1.5 2020    52.414637   5.514456    5.550490    5.182304
2   FHLG    2.0 2022    44.598755   3.346706    3.715995    3.902644
3   FHLG    2.0 2021    472.209165  5.802857    5.899596    5.627774
4   FHLG    2.0 2020    269.761452  7.090993    7.091404    6.567561

Using this code:

citi = pd.read_excel("Downloads/CITI_2022_05_22(5_22).xlsx")

#Extracting just the relevant months (M, M 1, M 2)
M = citi.columns[-6]
M_1 = citi.columns[-4]
M_2 = citi.columns[-2]

#Extracting just the relevant columns
cols = ['agency-term','coupon','year','Cbal',M,M_1,M_2]
citi = citi[cols]
todays_date = date.today()
current_year = todays_date.year
citi_new['year'] = current_year
citi_new['month'] = pd.to_datetime(citi_new.Month, format="%B").dt.month
citi_new['Month_Predicted_DT'] = pd.to_datetime(citi_new[['year', 'month']].assign(DAY=1))
citi_new = citi.set_index(cols[0:4]).stack().reset_index()
citi_new.rename(columns={"level_4": "Month", 0 : "CPR", "year" : "vintage"}, inplace = True)

For reference M is the current month, and M_1 and M_2 are month 1 and month 2.

My main question is that my solution for creating the 'Month_Predicted_DT column only works if the months in question do not overlap with the new year, so if M == November or M == December, then the year in Month_Predicted_DT is not correct for January and/or February. For example, Month_Predicted_DT for January rows should be 2023-01-01 not 2022. The same would be true if M was December, then I would want rows for Jan. and Feb. to be 2023-01-01 and 2023-02-01, respectively.

I have tried to come up with a workaround using df.iterrows or np.where but just can't really get a working solution.

CodePudding user response:

You could try adding 12 months to dates that are over two months out:

#get first day of the current month
start = pd.Timestamp.today().normalize().replace(day=1)

#convert month column to timestamps
dates = pd.to_datetime(df["Month"] f"{start.year}", format="%B%Y")

#offset the year if the date is not in the next 3 months
df["Month_Predicted_DT"] = dates.where(dates>=start,dates pd.DateOffset(months=12))
  • Related