Home > Enterprise >  How to forward-fill NaN values at a decaying rate in a pandas DataFrame
How to forward-fill NaN values at a decaying rate in a pandas DataFrame

Time:12-01

I try to replace NaN values in a pandas DataFrame with a forward fill method combined with a discount rate or decreasing rate of 0.9.

I have the following data set:

    Column1 Column2 Column3 Column4
0   1.0     5       -9.0    13.0
1   NaN     6       -10.0   15.0
2   3.0     7       NaN     NaN
3   NaN     8       NaN     NaN

For reproducibility:

df1 = pd.DataFrame({
                    'Column1':[1, 'NaN', 3, 'NaN'], 
                    'Column2':[5, 6, 7, 8], 
                    'Column3':[-9, -10, 'NaN', 'NaN'], 
                    'Column4':[13, 15, 'NaN', 'NaN']
                    })
df1 = df1.replace('NaN',np.nan)

I was able to replace the NaN values with the fillna command and the forward fill ffill method.

df2 = df1.fillna(method='ffill')

    Column1 Column2 Column3 Column4
0   1.0 5   -9.0    13.0
1   1.0 6   -10.0   15.0
2   3.0 7   -10.0   15.0
3   3.0 8   -10.0   15.0

Additionally, I am trying to apply the ratio 0.9 to all forward filled NaN values, which would yield the following data set:

NaN value row 2, column 3: -10 * 0.9 = -9

NaN value row 3, column 3: -9 * 0.9 = -8.1

    Column1 Column2 Column3 Column4
0   1.0     5       -9.0    13.00
1   0.9     6       -10.0   15.00
2   3.0     7       -9.0    13.50
3   2.7     8       -8.1    12.15

Is there an easy way to deal with that?

Thanks a lot!

CodePudding user response:

Create an exponent mask by counting consecutive NaN sequences using this groupby/cumsum idea:

groups = df1.notna().cumsum()
exp = df1.apply(lambda col: col.isna().groupby(groups[col.name]).cumsum())

#    Column1  Column2  Column3  Column4
# 0        0        0        0        0
# 1        1        0        0        0
# 2        0        0        1        1
# 3        1        0        2        2

Then ffill and multiply by 0.9 ** exp:

df2 = df1.ffill().mul(0.9 ** exp)

#    Column1  Column2  Column3  Column4
# 0      1.0      5.0     -9.0    13.00
# 1      0.9      6.0    -10.0    15.00
# 2      3.0      7.0     -9.0    13.50
# 3      2.7      8.0     -8.1    12.15
  • Related