Home > Blockchain >  How to fill nan with previously available value in wide format in pandas dataframe
How to fill nan with previously available value in wide format in pandas dataframe

Time:04-24

I have the following dataset. You can think of different IDs as individuals opening a bank account at different time in life, which is why there are many nans. The issue is that I do not observe values when the balance of an account is below a certain value, which is why you see discontinuity in account balance (i.e. B opened an account in 2020 Sept, but no available value for 2020 Dec.


ID 2021_12        2021_09     2021_06    2021_03      2020_12     2020_09
A 0.020637713       nan         nan         nan         nan         nan
B 0.020637717   0.020637717 0.020637717 0.020637717     nan     0.007053725
C 0.020637723   0.020637723 0.020637723 0.020637723 0.020637723 0.020637723
D 0.020637729       nan         nan         nan         nan         nan
E 0.020637733   0.02504525  0.02504525      nan         nan         nan     
F 0.020637735   0.020637735 0.020637735     nan         nan         nan 
G 0.020637735       nan     0.017617495     nan         nan         nan

I will have to eventually convert this into long format, and to preserve continuity i want to replace nan that appear after account opening with the previous available value.

Desired format


ID 2021_12        2021_09     2021_06    2021_03      2020_12     2020_09
A 0.020637713       nan         nan         nan         nan         nan
B 0.020637717   0.020637717 0.020637717 0.020637717 **0.007053725** 0.007053725
C 0.020637723   0.020637723 0.020637723 0.020637723 0.020637723 0.020637723
D 0.020637729       nan         nan         nan         nan         nan
E 0.020637733   0.02504525  0.02504525      nan         nan         nan     
F 0.020637735   0.020637735 0.020637735     nan         nan         nan 
G 0.020637735   **0.017617495** 0.017617495     nan         nan         nan

Any help will be appreciated!

CodePudding user response:

Maybe you are looking for bfill on column index:

out =  df.bfill(axis=1)
print(out)

# Output
  ID   2021_12   2021_09   2021_06   2021_03   2020_12   2020_09
0  A  0.020638       NaN       NaN       NaN       NaN       NaN
1  B  0.020638  0.020638  0.020638  0.020638  0.007054  0.007054
2  C  0.020638  0.020638  0.020638  0.020638  0.020638  0.020638
3  D  0.020638       NaN       NaN       NaN       NaN       NaN
4  E  0.020638  0.025045  0.025045       NaN       NaN       NaN
5  F  0.020638  0.020638  0.020638       NaN       NaN       NaN
6  G  0.020638  0.017617  0.017617       NaN       NaN       NaN
  • Related