Home > Software engineering >  Pandas pivot and include columns with variable values
Pandas pivot and include columns with variable values

Time:06-29

I have this dataframe jpm_2021:

      SRC     SRCDate Ticker  Coupon  Vintage   Bal   WAC  WAM  WALA       LNSZ  LTV  FICO  Refi% Month_Assessed       CPR Month_key
894   JPM  02/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  7.536801       M 2
1528  JPM  03/05/2021   FNCI     1.5     2020  28.7  2.25  175     4  293 / 286   60   777     91            Apr  5.131145       M 1
2162  JPM  04/07/2021   FNCI     1.5     2020  28.0  2.25  173     6  292 / 281   60   777     91            Apr  7.233214         M
2796  JPM  05/07/2021   FNCI     1.5     2020  27.6  2.25  171     7  292 / 279   60   777     91            Apr  8.900000       M-1
3430  JPM  06/07/2021   FNCI     1.5     2020  27.2  2.25  170     8  292 / 277   60   777     91            Apr  8.900000       M-2

And use pandas pivot() function as below:

jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()

To convert it to:

    SRC Ticker  Coupon  Vintage Month_Assessed  M    M 1    M 2    M-1  M-2
0   JPM FNCI    1.5     2020    Apr            7.23  5.13   7.53   8.9  8.9
1   JPM FNCI    1.5     2020    Aug            15.16 14.92  11.97  24.9 24.9
2   JPM FNCI    1.5     2020    Dec            11.58 14.51  19.00  5.0  5.0
3   JPM FNCI    1.5     2020    Feb            6.70  4.18   9.84   6.6  8.8
4   JPM FNCI    1.5     2020    Jan            4.29  10.19  12.88  6.6  5.0

The problem is I want to include all the middle columns (Bal ($bn) through Refi%) but when I add those to the index= call in jpm_2021.pivot(), it adds many rows and the M-->M-2 columns start to have many NaN values. This is because the value of those columns changes slightly throughout the dataframe. Any ideas how I can add those columns to the pivoted dataframe without generating all the NaNs? If I do include for example, Bal ($bn), it looks like this:

SRC Ticker  Coupon  Vintage Month_Assessed  Bal ($bn)   M   M 1 M 2 M-1 M-2
0   JPM     FNCI    1.5 2020    Apr            27.2   NaN   NaN NaN NaN 8.9
1   JPM     FNCI    1.5 2020    Apr            27.6   NaN   NaN NaN 8.9 NaN
2   JPM     FNCI    1.5 2020    Apr            28     7.23  NaN NaN NaN NaN
3   JPM     FNCI    1.5 2020    Apr            28.7   NaN  5.13 7.53 NaN NaN
4   JPM     FNCI    1.5 2020    Aug            24.9       NaN   NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM    G2SF    5.5 2008    May        1.2        24    21 21 24.3 NaN
7070 JPM    G2SF    5.5 2008    Nov        1.1        23    21 20 23.2  NaN
7071 JPM    G2SF    5.5 2008    Nov        1.3        NaN  NaN  NaN NaN 21.9
7072 JPM    G2SF    5.5 2008    Oct        1.1        21   20 23 24 25  NaN
7073 JPM    G2SF    5.5 2008    Sep        1.1        21   24 25 22 22 23

CodePudding user response:

What you're asking for doesn't make much sense. If you pivot a dataframe, and make its index columns that won't have an accompanying value for every pivoted value... then having NaN values is to be expected.

You could make it very wide like:

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key')

...

                                             SRCDate                                                   Bal                           WAC                          WAM                     WALA              ...       LNSZ                                  LTV                 FICO                     Refi%                       CPR
Month_key                                          M         M 1         M 2         M-1         M-2     M   M 1   M 2   M-1   M-2     M   M 1   M 2   M-1   M-2    M  M 1  M 2  M-1  M-2    M M 1 M 2 M-1  ...        M 1        M 2        M-1        M-2   M M 1 M 2 M-1 M-2    M  M 1  M 2  M-1  M-2     M M 1 M 2 M-1 M-2         M       M 1       M 2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed                                                                                                                                                                    ...
JPM FNCI   1.5    2020    Apr             04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021  28.0  28.7  28.7  27.6  27.2  2.25  2.25  2.25  2.25  2.25  173  175  175  171  170    6   4   4   7  ...  293 / 286  293 / 286  292 / 279  292 / 277  60  60  60  60  60  777  777  777  777  777    91  91  91  91  91  7.233214  5.131145  7.536801  8.9  8.9

Or maybe unstack that to have an interesting index:

df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key').stack(0)

...

Month_key                                                  M         M 1         M 2         M-1         M-2
SRC Ticker Coupon Vintage Month_Assessed
JPM FNCI   1.5    2020    Apr            Bal            28.0        28.7        28.7        27.6        27.2
                                         CPR        7.233214    5.131145    7.536801         8.9         8.9
                                         FICO            777         777         777         777         777
                                         LNSZ      292 / 281   293 / 286   293 / 286   292 / 279   292 / 277
                                         LTV              60          60          60          60          60
                                         Refi%            91          91          91          91          91
                                         SRCDate  04/07/2021  03/05/2021  02/05/2021  05/07/2021  06/07/2021
                                         WAC            2.25        2.25        2.25        2.25        2.25
                                         WALA              6           4           4           7           8
                                         WAM             173         175         175         171         170

But unless you update those columns that are slightly different to be the same, it's not going to magically be able to do that for you and create something out of nothing.


One way to approach that might be replacing with the average:

df['Bal'] = df.groupby(['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'])['Bal'].transform('mean')
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed', 'Bal'], columns='Month_key', values='CPR')

Output:

Month_key                                              M       M 1       M 2  M-1  M-2
SRC Ticker Coupon Vintage Month_Assessed Bal
JPM FNCI   1.5    2020    Apr            28.04  7.233214  5.131145  7.536801  8.9  8.9
  • Related