Home > Net >  how to strip customized missing value pandas dataframe
how to strip customized missing value pandas dataframe

Time:03-07


I have a dataset with a customized missing values which is the character `\?` but a cell with the missing value also contains whitespaces with inconsistent number of space characters. As in my example picture, at row 11, It could have 3 spaces, or 4 spaces.

So my idea is to apply the str.strip() function for each cell to identify it as the missing values and drop it, but it still is not recognized as missing values.

enter image description here

df = pd.read_csv('full_name', header=None, na_values=['?'])
df = df.apply(lambda x: x.str.strip() if x.dtype== 'object' else x)
df.dropna(axis=0, inplace=True, how='any')
df.head(20)]

what is an efficient way to solve this?

CodePudding user response:

Use:

df = pd.DataFrame({'test': [1,2, '    ? ', ' ?   ']})
df[~df['test'].str.contains('\?', na=False)]

CodePudding user response:

dropna drops NaN values. Since your NaNs are actually ?, you could replace them with NaN and use dropna:

df = df.replace('?', np.nan).dropna()

mask them and use dropna:

df = df.mask(df.eq('?')).dropna()

or simply filter those rows out and only select rows without any ?:

df = df[df.ne('?').all(axis=1)]
  • Related