Home > Software design >  How to check if a pandas column has anything other than specified values?
How to check if a pandas column has anything other than specified values?

Time:12-10

Say I have a pandas dataframe, with column A, and we want all the values in that column to be of type category with either L or R as the value.

How can I raise an exception if we detect that this column has any value other than L or R? That includes if it was None/null/NaN

CodePudding user response:

We can filter on L and R and then we get the opposite of that filter using the ~ operator like so:

df[~(df['A'].isin(['L', 'R']))]

To get a boolean value indicating that additional values are present in the Series, we can write:

len(df[~(df['A'].isin(['L', 'R']))]) == 0

We can be being even shorter and quicker by using the pandas.Series.any method which also returns a boolean value:

~(df['A'].isin(['L', 'R'])).any()

CodePudding user response:

this should work.

all(df["A"].isin(["L", "R"]))

You could use this in a statement like so

if all(df["A"].isin(["L", "R"])):
    print("it works!")

If you want to raise an exception if the statement returns false, you could write

if not all(df["A"].isin(["L", "R"])):
        raise ValueError("there's a wrong value in col A!")

or using assert, simply

assert all(df["A"].isin(["L", "R"])), "there's a wrong value in col A!

Speed test:

%timeit all(df["A"].isin(["L", "R"]))
%timeit len(df[~(df['A'].isin(['L', 'R']))]) == 0

# 55.9 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 167 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  • Related