I have the following dataset. You can think of different IDs as individuals opening a bank account at different time in life, which is why there are many nans. The issue is that I do not observe values when the balance of an account is below a certain value, which is why you see discontinuity in account balance (i.e. B opened an account in 2020 Sept, but no available value for 2020 Dec.
ID 2021_12 2021_09 2021_06 2021_03 2020_12 2020_09
A 0.020637713 nan nan nan nan nan
B 0.020637717 0.020637717 0.020637717 0.020637717 nan 0.007053725
C 0.020637723 0.020637723 0.020637723 0.020637723 0.020637723 0.020637723
D 0.020637729 nan nan nan nan nan
E 0.020637733 0.02504525 0.02504525 nan nan nan
F 0.020637735 0.020637735 0.020637735 nan nan nan
G 0.020637735 nan 0.017617495 nan nan nan
I will have to eventually convert this into long format, and to preserve continuity i want to replace nan that appear after account opening with the previous available value.
Desired format
ID 2021_12 2021_09 2021_06 2021_03 2020_12 2020_09
A 0.020637713 nan nan nan nan nan
B 0.020637717 0.020637717 0.020637717 0.020637717 **0.007053725** 0.007053725
C 0.020637723 0.020637723 0.020637723 0.020637723 0.020637723 0.020637723
D 0.020637729 nan nan nan nan nan
E 0.020637733 0.02504525 0.02504525 nan nan nan
F 0.020637735 0.020637735 0.020637735 nan nan nan
G 0.020637735 **0.017617495** 0.017617495 nan nan nan
Any help will be appreciated!
CodePudding user response:
Maybe you are looking for bfill
on column index:
out = df.bfill(axis=1)
print(out)
# Output
ID 2021_12 2021_09 2021_06 2021_03 2020_12 2020_09
0 A 0.020638 NaN NaN NaN NaN NaN
1 B 0.020638 0.020638 0.020638 0.020638 0.007054 0.007054
2 C 0.020638 0.020638 0.020638 0.020638 0.020638 0.020638
3 D 0.020638 NaN NaN NaN NaN NaN
4 E 0.020638 0.025045 0.025045 NaN NaN NaN
5 F 0.020638 0.020638 0.020638 NaN NaN NaN
6 G 0.020638 0.017617 0.017617 NaN NaN NaN