I want to replace np.nan
values with other value in pandas.DataFrame
using 'apply' function. And I will use replace
method that where NaN is replaced with max value of each column (axis=0). You better understand below.
import pandas as pd
df = pd.DataFrame({'a':[1, np.nan, 3],
'b':[np.nan,5,6],
'c':[7,8,np.nan]})
result = df.apply(lambda c: c.replace(np.nan, max(c)), axis=0)
print(result)
There are three np.nan
values. Two of them is replaced with appropriate values, but just one value is still np.nan
(below picture)
After setting argument axis
to 1
, there is still one value that isn't replaced. What's the reason?
CodePudding user response:
Python's max
doesn't work if a list starts with NaN; so max(df['b'])
returns NaN
and it cannot fill the NaN value in that column. Use c.max()
instead (which works because by default Series.max
skips NaNs). So:
df = df.apply(lambda c: c.replace(np.nan, c.max()), axis=0)
But instead of replace
, you could use fillna
on axis:
df = df.fillna(df.max(), axis=0)
Output:
a b c
0 1.0 6.0 7.0
1 3.0 5.0 8.0
2 3.0 6.0 8.0