Home > Back-end >  Function to replace all NaN values with zero:
Function to replace all NaN values with zero:

Time:10-07

I am trying to clean and fill out around 300 columns. I have already replaced all the empty fields with 'NaN', and now I am trying to convert those values to 0, if certain checks are passed:

  1. NaN values need to be present in the column.
  2. There cannot already exist 0 values in the column.
  3. If 0 already exists, replace with 0.1 instead. (I am still trying to figure out what to replace with, since 0 already contributes with relevant information for that particular column in the dataframe)

thus far I have implemented

def convert(df, col):
  if (df[col].isnull().sum() > 0): #& (df[df[col] != '0'])
  #if (df[df[col] != '0']): 
    df[col].replace(np.NaN, '0', inplace = True) 

for col in df.columns: 
  convert(df, col)

But, checking for the second condition (no zeroes can exist in the column already) is not working. Tried to implement it (commented out part), but returns following error:

TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

On an another note, regarding the field of Data Science; I am not sure whether some of the columns should have their empty fields replaced by the column-mean instead of 0. I have features describing weight, dimensions, prices etc.

CodePudding user response:

Use boolean mask.

Suppose the following dataframe:

>>> df
     A  B    C
0  0.0  1  2.0
1  NaN  4  5.0  # <- NaN should be replace by 0.1
2  6.0  7  NaN  # <- NaN should be replace by 0
m1 = df.isna().any()  # Is there a NaN in columns (not mandatory)
m2 = df.eq(0).any()   # Is there a 0 in columns

# Replace by 0
df.update(df.loc[:, m1 & ~m2].fillna(0))

# Replace by 0.1
df.update(df.loc[:, m1 & m2].fillna(0.1))

Only the second mask is useful

Output result:

>>> df
     A  B    C
0  0.0  1  2.0
1  0.1  4  5.0
2  6.0  7  0.0
  • Related