I am working on google collaboratory and I have two column on panda dataframe which some of the rows has similar value
A B
Syd Syd
Aus Del
Mir Ard
Dol Dol
I wish that the value in column B which has duplicate value with column A to be deleted, like below :
A B
Syd
Aus Del
Mir Ard
Dol
I try to use drop_duplicates()
like this one Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C but it will delete the entire column B. Any suggestions smarter ways to solve this problem?
Thanks before!
CodePudding user response:
There is no need to use drop_duplicates
, you can simply compare the column A
with B
, then mask
the values in B
where they are equal to A
df['B'] = df['B'].mask(df['A'].eq(df['B']))
Alternatively you can also use boolean indexing with loc
to mask the duplicated values
df.loc[df['A'].eq(df['B']), 'B'] = np.nan
A B
0 Syd NaN
1 Aus Del
2 Mir Ard
3 Dol NaN