How to shift a pandas column based on date column from previous month-CodePudding

I have a df called df_actual that looks like this:

   BondSecurity TradeCoupon IssuanceYear    AsOfDate      Cpr1
0      FNCL        4.0          2017      2022-06-30    17.888957
1      FNCL        4.0          2018      2022-04-30    26.383609
2      FNCL        4.0          2018      2022-05-31    20.834899
3      FNCL        4.0          2018      2022-06-30    17.707935
4      FNCL        4.0          2019      2022-04-30    28.453986
... ... ... ... ... ...
16230   G2SF       4.0          2017       2020-05-31   30.848600
16231   G2SF       4.0          2017       2020-06-30   38.159500
16232   G2SF       4.0          2018       2020-04-30   39.266900
16233   G2SF       4.0          2018       2020-05-31   40.343300
16234   G2SF       4.0          2018       2020-06-30   48.352200

I want to shift only the Cpr1 values back by one month, keeping the other rows where they are. For example, for row 1 theCpr1 value for 2022-04-30 would be 20.834899 since that is the value for 2022-05-31.

I need to apply this shift on individual cohorts within the dataframe on BondSecurity, TradeCoupon, and IssuanceYear.

CodePudding user response：

Clustering method:

df_actual['cohort'] = df_actual.BondSecurity   ' '   df_actual.IssuanceYear.astype(str)   ' '   df_actual.TradeCoupon.astype(str)
cohort_list = [df_actual[df_actual.cohort == x] for x in df_actual.cohort.unique()]

Then run each clustered dataframe through the following for loop to generate cluster-level lagged Cpr1 values

dfs_cpr_lagged = []
for df in cohort_list:
    df1 = df.copy()
    df1.index = df1.AsOfDate
    df1.index = df1.index.to_period('M')
    col_b_new = df1.groupby(level=0)['Cpr1'].first().shift(-1)
    df1['Cpr_Lagged'] = col_b_new
    dfs_cpr_lagged.append(dfs_cpr_lagged.append(df1.reset_index(drop = True)))

This is a list of clustered dataframes with a column for lagged CPR. Bravo.