Home > Back-end >  How can I define NA values in a list and check pandas dataframe?
How can I define NA values in a list and check pandas dataframe?

Time:10-12

In my dataframe, a value should be classed as missing if it is "nan", 0 or "missing".

I wrote a list

null_values = ["nan", 0, "missing"]

Then I checked which columns contained missing values

df.columns[df.isin(null_values).any()]

My df looks as follows:

{'Sick': {0: False, 1: False, 2: False, 3: False, 4: False, 5: False, 6: True},
'Name': {0: 'Liam',
1: 'Chloe',
2: 'Andy',
3: 'Max',
4: 'Anne',
5: nan,
6: 'Charlie'},
'Age': {0: 1.0, 1: 2.0, 2: 8.0, 3: 4.0, 4: 5.0, 5: 2.0, 6: 9.0}}

It flags the column 'Sick' as containing missing values even though it only contains FALSE/TRUE values. However, it correctly recognises that Age has no missing values. Why does it count FALSE as a missing value when I have not defined it as such in my list?

CodePudding user response:

One idea is exclude boolean columns by DataFrame.select_dtypes:

null_values = ["nan", 0, "missing"]


df1 = df.select_dtypes(exclude=bool)
print (df1.columns[df1.isin(null_values).any()])
Index([], dtype='object')

More info is here.

If need also compare NaN missing values add it to list:

null_values = ["nan", 0, "missing", np.nan]

df1 = df.select_dtypes(exclude=bool)
print (df1.columns[df1.isin(null_values).any()])
Index(['Name'], dtype='object')

EDIT: One trick is convert all values to strings, then need compare '0' for strings from integers and '0.0' for strings from floats:

null_values = ["nan", '0', '0.0', "missing"]

print (df.columns[df.astype(str).isin(null_values).any()])
Index(['Name'], dtype='object')
  • Related