Home > Back-end >  Is there a default Pandas method for removing null or missing values when they are represented by a
Is there a default Pandas method for removing null or missing values when they are represented by a

Time:03-21

In the dataset I'm working on, the Adult dataset, the missing values are indicated with the "?" string, and I want to discard the rows containing missing values.

In the documentation of the method df.dropna() there is no argument that offers the possibility of passing a custom value to interpret as the null/missing value,

I know I can simply solve the problem with something like:

df_str = df.select_dtypes(['object']) # get the columns containing the strings
for col in df_str.columns:
    df = df[df[col] != '?']

but I was wondering if there is a standard way of achieving this using Pandas apis which possibly offers more flexibility all while being faster.

CodePudding user response:

You can do any, this is to check row not contain ?: if match it will return True, the ~ will turn that to False and filter

df = df[~df_str.eq('?').any(1)]

CodePudding user response:

You could replace it with NaN and dropna:

df = df.replace('?', float('nan')).dropna()

CodePudding user response:

df.replace('?', np.nan, inplace=True)

followed by .dropna()

  • Related