Home > OS >  Remove duplicates using column value with some ignore condition
Remove duplicates using column value with some ignore condition

Time:02-05

I have two columns in my excel file and I want to remove duplicates from 'A' column with an ignore condition. The columns are as follow:

A B
1 10
1 20
2 30
2 40
3 10
3 20

Now, I want it to turn into this:

A B
1 10
2 30
2 40
3 10

So, basically I want to remove all duplicates except when column 'A' has value 2 (I want to ignore 2). My current code is as follows but it does not work for me as it removes duplicates with value '2' too.

df = pd.read_excel(save_filename)
df2 = df.drop_duplicates(subset=["A", "B"], keep='first')
df2.to_excel(save_filename, index=False)

CodePudding user response:

You can use two conditions:

df[~df.duplicated(subset="A") | df["A"].eq(2)]

   A   B
0  1  10
2  2  30
3  2  40
4  3  10
  • Related