I have a df like this:
d = {'label':['A','B','G','O']
,'label2':['C','D','O','Z']}
df = pd.DataFrame(d)
print(df)
label label2
0 A C
1 B D
2 G O
3 O Z
What i want to do is to get rid of the duplicate rows that have label = label2 (keep only the first) So i want to get something like this from the above df:
label label2
0 A C
1 B D
2 G O
I do this below, but it doesn't do the trick
df[~df[['label', 'label2']].apply(frozenset, axis=1).duplicated()]
Any idea on how to tackle this?
CodePudding user response:
Try this, using .isin
method for Seires:
mask = ~df['label'].isin(df['label2'])
df_output = df[mask]
print(df_output)
Output:
label label2
0 A C
1 B D
2 G O
CodePudding user response:
You can use drop
to remove duplicate label between 2 columns:
df.drop(df[df['label'].isin(df['label2'])].index, inplace=True)
print(df)
# Output:
label label2
0 A C
1 B D
2 G O