I have this dataframe jpm_2021
:
SRC SRCDate Ticker Coupon Vintage Bal WAC WAM WALA LNSZ LTV FICO Refi% Month_Assessed CPR Month_key
894 JPM 02/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 7.536801 M 2
1528 JPM 03/05/2021 FNCI 1.5 2020 28.7 2.25 175 4 293 / 286 60 777 91 Apr 5.131145 M 1
2162 JPM 04/07/2021 FNCI 1.5 2020 28.0 2.25 173 6 292 / 281 60 777 91 Apr 7.233214 M
2796 JPM 05/07/2021 FNCI 1.5 2020 27.6 2.25 171 7 292 / 279 60 777 91 Apr 8.900000 M-1
3430 JPM 06/07/2021 FNCI 1.5 2020 27.2 2.25 170 8 292 / 277 60 777 91 Apr 8.900000 M-2
And use pandas pivot()
function as below:
jpm_final = jpm_2021.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns="Month_key", values="CPR").rename_axis(columns=None).reset_index()
To convert it to:
SRC Ticker Coupon Vintage Month_Assessed M M 1 M 2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 7.23 5.13 7.53 8.9 8.9
1 JPM FNCI 1.5 2020 Aug 15.16 14.92 11.97 24.9 24.9
2 JPM FNCI 1.5 2020 Dec 11.58 14.51 19.00 5.0 5.0
3 JPM FNCI 1.5 2020 Feb 6.70 4.18 9.84 6.6 8.8
4 JPM FNCI 1.5 2020 Jan 4.29 10.19 12.88 6.6 5.0
The problem is I want to include all the middle columns (Bal ($bn)
through Refi%
) but when I add those to the index=
call in jpm_2021.pivot()
, it adds many rows and the M-->M-2 columns start to have many NaN
values. This is because the value of those columns changes slightly throughout the dataframe. Any ideas how I can add those columns to the pivoted dataframe without generating all the NaNs? If I do include for example, Bal ($bn)
, it looks like this:
SRC Ticker Coupon Vintage Month_Assessed Bal ($bn) M M 1 M 2 M-1 M-2
0 JPM FNCI 1.5 2020 Apr 27.2 NaN NaN NaN NaN 8.9
1 JPM FNCI 1.5 2020 Apr 27.6 NaN NaN NaN 8.9 NaN
2 JPM FNCI 1.5 2020 Apr 28 7.23 NaN NaN NaN NaN
3 JPM FNCI 1.5 2020 Apr 28.7 NaN 5.13 7.53 NaN NaN
4 JPM FNCI 1.5 2020 Aug 24.9 NaN NaN NaN NaN 24.9
... ... ... ... ... ... ... ... ... ... ... ...
7069 JPM G2SF 5.5 2008 May 1.2 24 21 21 24.3 NaN
7070 JPM G2SF 5.5 2008 Nov 1.1 23 21 20 23.2 NaN
7071 JPM G2SF 5.5 2008 Nov 1.3 NaN NaN NaN NaN 21.9
7072 JPM G2SF 5.5 2008 Oct 1.1 21 20 23 24 25 NaN
7073 JPM G2SF 5.5 2008 Sep 1.1 21 24 25 22 22 23
CodePudding user response:
What you're asking for doesn't make much sense. If you pivot a dataframe, and make its index columns that won't have an accompanying value for every pivoted value... then having NaN values is to be expected.
You could make it very wide like:
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key')
...
SRCDate Bal WAC WAM WALA ... LNSZ LTV FICO Refi% CPR
Month_key M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 ... M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2 M M 1 M 2 M-1 M-2
SRC Ticker Coupon Vintage Month_Assessed ...
JPM FNCI 1.5 2020 Apr 04/07/2021 03/05/2021 02/05/2021 05/07/2021 06/07/2021 28.0 28.7 28.7 27.6 27.2 2.25 2.25 2.25 2.25 2.25 173 175 175 171 170 6 4 4 7 ... 293 / 286 293 / 286 292 / 279 292 / 277 60 60 60 60 60 777 777 777 777 777 91 91 91 91 91 7.233214 5.131145 7.536801 8.9 8.9
Or maybe unstack that to have an interesting index:
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'], columns='Month_key').stack(0)
...
Month_key M M 1 M 2 M-1 M-2
SRC Ticker Coupon Vintage Month_Assessed
JPM FNCI 1.5 2020 Apr Bal 28.0 28.7 28.7 27.6 27.2
CPR 7.233214 5.131145 7.536801 8.9 8.9
FICO 777 777 777 777 777
LNSZ 292 / 281 293 / 286 293 / 286 292 / 279 292 / 277
LTV 60 60 60 60 60
Refi% 91 91 91 91 91
SRCDate 04/07/2021 03/05/2021 02/05/2021 05/07/2021 06/07/2021
WAC 2.25 2.25 2.25 2.25 2.25
WALA 6 4 4 7 8
WAM 173 175 175 171 170
But unless you update those columns that are slightly different to be the same, it's not going to magically be able to do that for you and create something out of nothing.
One way to approach that might be replacing with the average:
df['Bal'] = df.groupby(['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed'])['Bal'].transform('mean')
df.pivot(index=['SRC', 'Ticker', 'Coupon', 'Vintage', 'Month_Assessed', 'Bal'], columns='Month_key', values='CPR')
Output:
Month_key M M 1 M 2 M-1 M-2
SRC Ticker Coupon Vintage Month_Assessed Bal
JPM FNCI 1.5 2020 Apr 28.04 7.233214 5.131145 7.536801 8.9 8.9