Home > database >  Fillna if all the values of a column are null in pandas
Fillna if all the values of a column are null in pandas

Time:11-17

I have to fill a column only if all the values of that column are null. For example c

df = pd.DataFrame(data = {"col1":[3, np.nan, np.nan, 21, np.nan],
                          "col2":[4, np.nan, 12, np.nan, np.nan],
                          "col3":[33, np.nan, 55, np.nan, np.nan],
                          "col4":[np.nan, np.nan, np.nan, np.nan, np.nan]})


>>> df
   col1  col2  col3  col4
0   3.0   4.0  33.0   NaN
1   NaN   NaN   NaN   NaN
2   NaN  12.0  55.0   NaN
3  21.0   NaN   NaN   NaN
4   NaN   NaN   NaN   NaN

In the above example, I have to replace the values of col4 with 100 since all the values are null/NaN.

So for the above example. I have to get the output as below.

   col1  col2  col3  col4
0   3.0   4.0  33.0   100
1   NaN   NaN   NaN   100
2   NaN  12.0  55.0   100
3  21.0   NaN   NaN   100
4   NaN   NaN   NaN   100

Tried using the below command. But its replacing values of a column only if it contains atleast 1 non-nan value

df.where(df.isnull().all(axis=1), df.fillna(100), inplace=True)

Could you please let me know how to do this.

Thanks

CodePudding user response:

Use indexing:

df.loc[:, df.isna().all()] = 100
print(df)

# Output:
   col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0

CodePudding user response:

This command should help:

df.loc[:, df.isnull().all(axis=0)] = 100

Output:

    col1    col2    col3    col4
0   3.0     4.0     33.0    100.0
1   NaN     NaN     NaN     100.0
2   NaN     12.0    55.0    100.0
3   21.0    NaN     NaN     100.0
4   NaN     NaN     NaN     100.0

CodePudding user response:

Warning: after OP modified the question and gave more details on the exact expected output, this answer is no longer valid.

You almost had it ;)

df['col4'] = df['col4'].mask(df.isnull().all(axis=1), 100)

Your errors:

  • you used where, so you should have inverted your condition (I used mask as an alternative)
  • you only need a scalar as replacement
  • you only needed to apply the operation on 'col4'

output:

       col1      col2      col3      col4
0    3.0000    4.0000   33.0000       NaN
1       NaN       NaN       NaN  100.0000
2       NaN   12.0000   55.0000       NaN
3   21.0000       NaN       NaN       NaN
4       NaN       NaN       NaN  100.0000

CodePudding user response:

Since you could have multiple columns which are all having NaN, you can do this:

In [717]: cols = df.columns[df.isna().all()]
In [718]: cols
Out[718]: Index(['col4'], dtype='object')

You can replace multiple columns at once:

In [720]: df[cols] = 100

In [721]: df
Out[721]: 
   col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0

CodePudding user response:

use the loc accessor to update any row that does not have all values

 df.loc[:,~df.notna().any()]=100

output

col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0
  • Related