Home > OS >  How to compare two columns and remove duplicates from the first one
How to compare two columns and remove duplicates from the first one

Time:02-20

Sup guys!
I faced problem connected with removing duplicates from one column comparing to another. There's Excel file with two columns. It's like:

first_column second_column
string 1 string 2
string 3 string 4
string 5 string 6
string 7 string 3
... ...
string N NaN

So some values from the first column duplicates in second column. And I want to delete these values from the first one.

I try to use drop_duplicates(keep="last"), but it doesn't work as well.
I already learn how to read excel via pandas and print it, but removing duplicates bogged me down. Then I'd like to receive "clean" first column and write it to new file, but I guess I can do it by myself.

Here's my code:

import pandas as pd

file_location = r"PATH/file.xlsx"
file = pd.read_excel(file_location)
file = file.drop_duplicates(keep="last")

print(file)

I hope you can help me or push on the right thoughts!

CodePudding user response:

This should work:

df.loc[~df['first_column'].isin(df['second_column'].tolist())]
  • Related