Let's say I have the following dataframe:
df = pd.DataFrame({"a":[5, 6, 7, 8, 11],
"b":["A", "B", "C", "B", "A"],
"c":[27, 6, 1, 8, 3],
"d":[31, 26, 17, 8, 3],
"e":[12, np.nan, 11, 8, 6],
"f":[5, np.nan, 5, np.nan, 7],
"g":[27, 5, 12, 4, 19],
"h":[6, 16, 11, 2, 9],
"i":['One', "Two", "One", "Three", "One"]
})
df
I want to check if there are rows who do not meet certain requirements. For example:
if ((df.d < df.c) & (df.a != 0) & (df.i == "X")).all() is False:
raise ValueError(f"Incorrect! ")
else:
print('Seems ok!')
This should raise a ValueError, as not in every row column d is larger than column c, and there is no value 'X' in column i. However, I keep getting the result 'Seems ok!'. What is wrong in the if-statement that is does not raise a ValueError?
CodePudding user response:
Use not
or == False
instead is
:
if not ((df.d < df.c) & (df.a != 0) & (df.i == "X")).all() :
raise ValueError(f"Incorrect! ")
else:
print('Seems ok!')
Or:
if ((df.d < df.c) & (df.a != 0) & (df.i == "X")).all() :
print('Seems ok!')
else:
raise ValueError(f"Incorrect! ")
CodePudding user response:
Per De Morgan's law, you can invert all your conditions, use OR (|
), and any
:
if ((df.d >= df.c) | (df.a == 0) | (df.i != "X")).any():
raise ValueError(f"Incorrect! ")
else:
print('Seems ok!')
output: ValueError: Incorrect!