I try to replace NaN values in a pandas DataFrame with a forward fill method combined with a discount rate or decreasing rate of 0.9.
I have the following data set:
Column1 Column2 Column3 Column4
0 1.0 5 -9.0 13.0
1 NaN 6 -10.0 15.0
2 3.0 7 NaN NaN
3 NaN 8 NaN NaN
For reproducibility:
df1 = pd.DataFrame({
'Column1':[1, 'NaN', 3, 'NaN'],
'Column2':[5, 6, 7, 8],
'Column3':[-9, -10, 'NaN', 'NaN'],
'Column4':[13, 15, 'NaN', 'NaN']
})
df1 = df1.replace('NaN',np.nan)
I was able to replace the NaN values with the fillna
command and the forward fill ffill
method.
df2 = df1.fillna(method='ffill')
Column1 Column2 Column3 Column4
0 1.0 5 -9.0 13.0
1 1.0 6 -10.0 15.0
2 3.0 7 -10.0 15.0
3 3.0 8 -10.0 15.0
Additionally, I am trying to apply the ratio 0.9 to all forward filled NaN values, which would yield the following data set:
NaN value row 2, column 3: -10 * 0.9 = -9
NaN value row 3, column 3: -9 * 0.9 = -8.1
Column1 Column2 Column3 Column4
0 1.0 5 -9.0 13.00
1 0.9 6 -10.0 15.00
2 3.0 7 -9.0 13.50
3 2.7 8 -8.1 12.15
Is there an easy way to deal with that?
Thanks a lot!
CodePudding user response:
Create an exponent mask by counting consecutive NaN sequences using this groupby/cumsum idea:
groups = df1.notna().cumsum()
exp = df1.apply(lambda col: col.isna().groupby(groups[col.name]).cumsum())
# Column1 Column2 Column3 Column4
# 0 0 0 0 0
# 1 1 0 0 0
# 2 0 0 1 1
# 3 1 0 2 2
Then ffill
and multiply
by 0.9 ** exp
:
df2 = df1.ffill().mul(0.9 ** exp)
# Column1 Column2 Column3 Column4
# 0 1.0 5.0 -9.0 13.00
# 1 0.9 6.0 -10.0 15.00
# 2 3.0 7.0 -9.0 13.50
# 3 2.7 8.0 -8.1 12.15