Home > Mobile >  How to check in pandas that column is bool-like (includes either True, False or NaN)?
How to check in pandas that column is bool-like (includes either True, False or NaN)?

Time:10-01

I have a dataframe like so:

df = pd.DataFrame(
  {
    'date':"20220701",
    'a':[1,2,np.NaN],
    'b':['a', 'b', 'c'], 
     'c':[True, False, np.NaN]
  }
)

columns b and c have therefore dtype object. I'd like to be able to efficiently distinguish columns, that could be boolean if they had no missing value.

Only solutions that came to my mind are:

  1. check if the unique values in a column are in [true, false, NaN], but that would most likely be supper inefficient.

  2. check where (df.c.isnull() | (df.c == True) | (df.c == False)).all()

CodePudding user response:

Instead of using unique() you could use something like df.b[:10] and compare those first 10 samples to assume if your data is boolean or not.

I think it can fail but it will be faster than unique() ...

CodePudding user response:

here is one way to do it using assign

since the column is created via assign, its temporary, and not a part of df. so, nothing is lost or added

#create a temp column by ffill NA value, and check temp column dtype
df.assign(temp=df['c'].ffill())['temp'].dtype
dtype('bool')
>> df.assign(temp=df['c'].ffill())['temp'].dtype == 'bool'
True

or

#list types of the column and the newly created one is of type bool
df.assign(temp=df['c'].ffill()).dtypes

date     object
a       float64
b        object
c        object
temp       bool
dtype: object
  • Related