Home > OS >  Remove one of duplicate value in two columns of dataframe
Remove one of duplicate value in two columns of dataframe

Time:11-22

I am working on google collaboratory and I have two column on panda dataframe which some of the rows has similar value

A   B
Syd Syd
Aus Del
Mir Ard
Dol Dol

I wish that the value in column B which has duplicate value with column A to be deleted, like below :

A   B
Syd 
Aus Del
Mir Ard
Dol 

I try to use drop_duplicates() like this one Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C but it will delete the entire column B. Any suggestions smarter ways to solve this problem?

Thanks before!

CodePudding user response:

There is no need to use drop_duplicates, you can simply compare the column A with B, then mask the values in B where they are equal to A

df['B'] = df['B'].mask(df['A'].eq(df['B']))

Alternatively you can also use boolean indexing with loc to mask the duplicated values

df.loc[df['A'].eq(df['B']), 'B'] = np.nan

     A    B
0  Syd  NaN
1  Aus  Del
2  Mir  Ard
3  Dol  NaN
  • Related