Home > database >  remove duplicate rows where one column equals a different column
remove duplicate rows where one column equals a different column

Time:10-13

I have a df like this:

d = {'label':['A','B','G','O']
    ,'label2':['C','D','O','Z']}
df = pd.DataFrame(d)
print(df)

  label label2
0   A    C
1   B    D
2   G    O
3   O    Z

What i want to do is to get rid of the duplicate rows that have label = label2 (keep only the first) So i want to get something like this from the above df:

  label label2
0   A    C
1   B    D
2   G    O

I do this below, but it doesn't do the trick

df[~df[['label', 'label2']].apply(frozenset, axis=1).duplicated()]

Any idea on how to tackle this?

CodePudding user response:

Try this, using .isin method for Seires:

mask = ~df['label'].isin(df['label2'])
df_output = df[mask]
print(df_output)

Output:

  label label2
0   A    C
1   B    D
2   G    O

CodePudding user response:

You can use drop to remove duplicate label between 2 columns:

df.drop(df[df['label'].isin(df['label2'])].index, inplace=True)
print(df)

# Output:
  label label2
0     A      C
1     B      D
2     G      O
  • Related