Say I have a pandas dataframe, with column A
, and we want all the values in that column to be of type category with either L
or R
as the value.
How can I raise an exception if we detect that this column has any value other than L
or R
? That includes if it was None
/null
/NaN
CodePudding user response:
We can filter on L
and R
and then we get the opposite of that filter using the ~
operator like so:
df[~(df['A'].isin(['L', 'R']))]
To get a boolean value indicating that additional values are present in the Series, we can write:
len(df[~(df['A'].isin(['L', 'R']))]) == 0
We can be being even shorter and quicker by using the pandas.Series.any
method which also returns a boolean value:
~(df['A'].isin(['L', 'R'])).any()
CodePudding user response:
this should work.
all(df["A"].isin(["L", "R"]))
You could use this in a statement like so
if all(df["A"].isin(["L", "R"])):
print("it works!")
If you want to raise an exception if the statement returns false, you could write
if not all(df["A"].isin(["L", "R"])):
raise ValueError("there's a wrong value in col A!")
or using assert, simply
assert all(df["A"].isin(["L", "R"])), "there's a wrong value in col A!
Speed test:
%timeit all(df["A"].isin(["L", "R"]))
%timeit len(df[~(df['A'].isin(['L', 'R']))]) == 0
# 55.9 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 167 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)