Home > Software engineering >  Replace 0 with NaN for selected columns only if all values are 0 in Pandas
Replace 0 with NaN for selected columns only if all values are 0 in Pandas

Time:11-08

Given a dataframe df as follows:

   id  value1  value2  value3
0   0      22       1       7
1   1       0       0       0
2   2       0       0       0
3   3       4       1      25
4   4       5       0      24
5   5       0       0       3

Or:

df = pd.DataFrame([{'id': 0, 'value1': 22, 'value2': 1, 'value3': 7},
 {'id': 1, 'value1': 0, 'value2': 0, 'value3': 0},
 {'id': 2, 'value1': 0, 'value2': 0, 'value3': 0},
 {'id': 3, 'value1': 4, 'value2': 1, 'value3': 25},
 {'id': 4, 'value1': 5, 'value2': 0, 'value3': 24},
 {'id': 5, 'value1': 0, 'value2': 0, 'value3': 3}])

I need to subset columns 'value1', 'value2', 'value3', replace 0 with NaN if all values are 0s row-wisely. How could I acheive that in Pandas?

The exepected result will like this:

   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0

Code I have tried:

cols = ['value1', 'value2', 'value3']
df[cols] = df[cols].replace({'0': np.NaN, 0: np.NaN})

CodePudding user response:

Use mask:

df[cols] = df[cols].mask(df[cols].eq(0).all(axis=1))

mask automatically sets the row to NaN if the condition (df[cols].eq(0).all(axis=1)) is True.

Original answer:

I'd prefer mask:

>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1))
    value1  value2  value3
id                        
0     22.0     1.0     7.0
1      NaN     NaN     NaN
2      NaN     NaN     NaN
3      4.0     1.0    25.0
4      5.0     0.0    24.0
5      0.0     0.0     3.0
>>> 

With resetting index:

>>> df.set_index('id').mask(df[cols].eq(0).all(axis=1)).reset_index()
   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0
>>> 

CodePudding user response:

Perhaps:

mask = (df[cols].eq(0).all(axis=1))
df.loc[mask, cols] = np.nan

Output:

   id  value1  value2  value3
0   0    22.0     1.0     7.0
1   1     NaN     NaN     NaN
2   2     NaN     NaN     NaN
3   3     4.0     1.0    25.0
4   4     5.0     0.0    24.0
5   5     0.0     0.0     3.0

CodePudding user response:

You're mixing types, your code tries to compare two different data types and throws an TypeError: TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str' To make your code work you simply have to remove '0': np.NaN from your dict or instead of a dict pass two arguments:

df[cols] = df[cols].replace(0, float('NaN'))
  • Related