Home > Software engineering >  pandas AND of two negation conditions gives weird result
pandas AND of two negation conditions gives weird result

Time:11-14

I am tryin to apply AND of two negation conditions on a dataframe, that gives weird result:

import pandas as pd
df = pd.DataFrame([{'name':'John', 'gneder':None, 'gneder1':None, 'gneder2':'M3'},{'name':'Jack', 'gneder':'Unclassified', 'gneder1':'M2', 'gneder2':None},{'name':'Jessy', 'gneder':None, 'gneder1':'F2', 'gneder2':None}])
df

src_col = 'gneder'
target_col = 'GENDER'
df[target_col] = ""


df.loc[ df[src_col] != 'Unclassified' ]

name    gneder  gneder1 gneder2 GENDER
0   John    None    None    M3  
2   Jessy   None    F2  None    

df.loc[ ~df[src_col].isnull()]

    name    gneder  gneder1 gneder2 GENDER
1   Jack    Unclassified    M2  None    

df.loc[  ((~df[src_col].isnull()) & df[src_col] != 'Unclassified')]


name    gneder  gneder1 gneder2 GENDER
0   John    None    None    M3  
1   Jack    Unclassified    M2  None    
2   Jessy   None    F2  None    

I am expecting the final filter to return no records, but it return all records

CodePudding user response:

There is parantheses mismatch - first was removed because not compare by operators like ==, !=, >... and for second condition was added because priority of operations:

df = df.loc[ ~df[src_col].isnull() & (df[src_col] != 'Unclassified')]
#df = df.loc[ df[src_col].notna() & (df[src_col] != 'Unclassified')]
print (df)
Empty DataFrame
Columns: [name, gneder, gneder1, gneder2, GENDER]
Index: []
  • Related