I have to fill a column only if all the values of that column are null. For example c
df = pd.DataFrame(data = {"col1":[3, np.nan, np.nan, 21, np.nan],
"col2":[4, np.nan, 12, np.nan, np.nan],
"col3":[33, np.nan, 55, np.nan, np.nan],
"col4":[np.nan, np.nan, np.nan, np.nan, np.nan]})
>>> df
col1 col2 col3 col4
0 3.0 4.0 33.0 NaN
1 NaN NaN NaN NaN
2 NaN 12.0 55.0 NaN
3 21.0 NaN NaN NaN
4 NaN NaN NaN NaN
In the above example, I have to replace the values of col4 with 100 since all the values are null/NaN.
So for the above example. I have to get the output as below.
col1 col2 col3 col4
0 3.0 4.0 33.0 100
1 NaN NaN NaN 100
2 NaN 12.0 55.0 100
3 21.0 NaN NaN 100
4 NaN NaN NaN 100
Tried using the below command. But its replacing values of a column only if it contains atleast 1 non-nan value
df.where(df.isnull().all(axis=1), df.fillna(100), inplace=True)
Could you please let me know how to do this.
Thanks
CodePudding user response:
Use indexing
:
df.loc[:, df.isna().all()] = 100
print(df)
# Output:
col1 col2 col3 col4
0 3.0 4.0 33.0 100.0
1 NaN NaN NaN 100.0
2 NaN 12.0 55.0 100.0
3 21.0 NaN NaN 100.0
4 NaN NaN NaN 100.0
CodePudding user response:
This command should help:
df.loc[:, df.isnull().all(axis=0)] = 100
Output:
col1 col2 col3 col4
0 3.0 4.0 33.0 100.0
1 NaN NaN NaN 100.0
2 NaN 12.0 55.0 100.0
3 21.0 NaN NaN 100.0
4 NaN NaN NaN 100.0
CodePudding user response:
Warning: after OP modified the question and gave more details on the exact expected output, this answer is no longer valid.
You almost had it ;)
df['col4'] = df['col4'].mask(df.isnull().all(axis=1), 100)
Your errors:
- you used
where
, so you should have inverted your condition (I usedmask
as an alternative) - you only need a scalar as replacement
- you only needed to apply the operation on 'col4'
output:
col1 col2 col3 col4
0 3.0000 4.0000 33.0000 NaN
1 NaN NaN NaN 100.0000
2 NaN 12.0000 55.0000 NaN
3 21.0000 NaN NaN NaN
4 NaN NaN NaN 100.0000
CodePudding user response:
Since you could have multiple columns which are all having NaN
, you can do this:
In [717]: cols = df.columns[df.isna().all()]
In [718]: cols
Out[718]: Index(['col4'], dtype='object')
You can replace multiple columns at once:
In [720]: df[cols] = 100
In [721]: df
Out[721]:
col1 col2 col3 col4
0 3.0 4.0 33.0 100.0
1 NaN NaN NaN 100.0
2 NaN 12.0 55.0 100.0
3 21.0 NaN NaN 100.0
4 NaN NaN NaN 100.0
CodePudding user response:
use the loc accessor to update any row that does not have all values
df.loc[:,~df.notna().any()]=100
output
col1 col2 col3 col4
0 3.0 4.0 33.0 100.0
1 NaN NaN NaN 100.0
2 NaN 12.0 55.0 100.0
3 21.0 NaN NaN 100.0
4 NaN NaN NaN 100.0