I have a df called df_actual
that looks like this:
BondSecurity TradeCoupon IssuanceYear AsOfDate Cpr1
0 FNCL 4.0 2017 2022-06-30 17.888957
1 FNCL 4.0 2018 2022-04-30 26.383609
2 FNCL 4.0 2018 2022-05-31 20.834899
3 FNCL 4.0 2018 2022-06-30 17.707935
4 FNCL 4.0 2019 2022-04-30 28.453986
... ... ... ... ... ...
16230 G2SF 4.0 2017 2020-05-31 30.848600
16231 G2SF 4.0 2017 2020-06-30 38.159500
16232 G2SF 4.0 2018 2020-04-30 39.266900
16233 G2SF 4.0 2018 2020-05-31 40.343300
16234 G2SF 4.0 2018 2020-06-30 48.352200
I want to shift only the Cpr1
values back by one month, keeping the other rows where they are. For example, for row 1 theCpr1
value for 2022-04-30 would be 20.834899
since that is the value for 2022-05-31.
I need to apply this shift on individual cohorts within the dataframe on BondSecurity, TradeCoupon, and IssuanceYear.
CodePudding user response:
Clustering method:
df_actual['cohort'] = df_actual.BondSecurity ' ' df_actual.IssuanceYear.astype(str) ' ' df_actual.TradeCoupon.astype(str)
cohort_list = [df_actual[df_actual.cohort == x] for x in df_actual.cohort.unique()]
Then run each clustered dataframe through the following for loop to generate cluster-level lagged Cpr1
values
dfs_cpr_lagged = []
for df in cohort_list:
df1 = df.copy()
df1.index = df1.AsOfDate
df1.index = df1.index.to_period('M')
col_b_new = df1.groupby(level=0)['Cpr1'].first().shift(-1)
df1['Cpr_Lagged'] = col_b_new
dfs_cpr_lagged.append(dfs_cpr_lagged.append(df1.reset_index(drop = True)))
This is a list of clustered dataframes with a column for lagged CPR. Bravo.