I am tryin to apply AND of two negation conditions on a dataframe, that gives weird result:
import pandas as pd
df = pd.DataFrame([{'name':'John', 'gneder':None, 'gneder1':None, 'gneder2':'M3'},{'name':'Jack', 'gneder':'Unclassified', 'gneder1':'M2', 'gneder2':None},{'name':'Jessy', 'gneder':None, 'gneder1':'F2', 'gneder2':None}])
df
src_col = 'gneder'
target_col = 'GENDER'
df[target_col] = ""
df.loc[ df[src_col] != 'Unclassified' ]
name gneder gneder1 gneder2 GENDER
0 John None None M3
2 Jessy None F2 None
df.loc[ ~df[src_col].isnull()]
name gneder gneder1 gneder2 GENDER
1 Jack Unclassified M2 None
df.loc[ ((~df[src_col].isnull()) & df[src_col] != 'Unclassified')]
name gneder gneder1 gneder2 GENDER
0 John None None M3
1 Jack Unclassified M2 None
2 Jessy None F2 None
I am expecting the final filter to return no records, but it return all records
CodePudding user response:
There is parantheses mismatch - first was removed because not compare by operators like ==, !=, >...
and for second condition was added because priority of operations:
df = df.loc[ ~df[src_col].isnull() & (df[src_col] != 'Unclassified')]
#df = df.loc[ df[src_col].notna() & (df[src_col] != 'Unclassified')]
print (df)
Empty DataFrame
Columns: [name, gneder, gneder1, gneder2, GENDER]
Index: []