Home > Software design >  Pandas - how to flag if a dataframe column has a non-permitted value in it?
Pandas - how to flag if a dataframe column has a non-permitted value in it?

Time:07-12

I have a dataframe that looks a bit like this:

offer | type
------|-----
123   | A
456   | B
789   | C

I want to set up an if statement which prints a warning message if any values other than A or B are included in the type column. The values can be in upper or lower case, but should only be A or B.

I've tried using the code below, but it doesn't work - it returns the message saying everything is ok regardless of whether there are other types in the type column:

if ~df["type"].isin(["A","B","a","b"]).any():
    print("WARNING - Not all offers are the correct types!")
else:
    print("OK - All offers are the correct types.") 

Does anyone know where I'm going wrong please?

CodePudding user response:

Try

import pandas as pd
# sample data
df = pd.DataFrame(list('ABC'), columns=['type'])
# logic using .all()
if df['type'].isin(list('ABab')).all():
    print("OK - All offers are the correct types.") 
else:
    print("WARNING - Not all offers are the correct types!")

CodePudding user response:

Chris's answer is the better solution, but to show where your method went wrong:

if (~df["type"].isin(["A","B","a","b"])).any():
    print("WARNING - Not all offers are the correct types!")
else:
    print("OK - All offers are the correct types.") 

will work correctly.

Note the extra parentheses around ~df["type"].isin(["A","B","a","b"]). Because that is the expression that you want to check for any True value, but .any() in your expression applies to df["type"].isin(["A","B","a","b"]). So your statement is equivalent to

~(df["type"].isin(["A","B","a","b"])).any()

Thus, the negation happens after .any() is applied, in your case. In my solution above, the negation happens before that .any() is applied.

This is basically as case of operator precedence, or what binds more tightly (the . for the method call binds more tightly here than the negation operator ~).

CodePudding user response:

I think this will help you if any one of the values in the Type column is not in ['A', 'B', 'a', 'b'], should print a warning as I understand it. And use all() instead of any()

if ~(df['type'].isin(['A', 'B', 'a', 'b'])).all():
    print('warning')
else:
    print('ok')
  • Related