Is there a default Pandas method for removing null or missing values when they are represented by a-CodePudding

In the dataset I'm working on, the Adult dataset, the missing values are indicated with the "?" string, and I want to discard the rows containing missing values.

In the documentation of the method df.dropna() there is no argument that offers the possibility of passing a custom value to interpret as the null/missing value,

I know I can simply solve the problem with something like:

df_str = df.select_dtypes(['object']) # get the columns containing the strings
for col in df_str.columns:
    df = df[df[col] != '?']

but I was wondering if there is a standard way of achieving this using Pandas apis which possibly offers more flexibility all while being faster.

CodePudding user response：

You can do any, this is to check row not contain ?: if match it will return True, the ~ will turn that to False and filter

df = df[~df_str.eq('?').any(1)]

CodePudding user response：

You could replace it with NaN and dropna:

df = df.replace('?', float('nan')).dropna()

CodePudding user response：

df.replace('?', np.nan, inplace=True)

followed by .dropna()