I have two columns in my excel file and I want to remove duplicates from 'A' column with an ignore condition. The columns are as follow:
A B
1 10
1 20
2 30
2 40
3 10
3 20
Now, I want it to turn into this:
A B
1 10
2 30
2 40
3 10
So, basically I want to remove all duplicates except when column 'A' has value 2 (I want to ignore 2). My current code is as follows but it does not work for me as it removes duplicates with value '2' too.
df = pd.read_excel(save_filename)
df2 = df.drop_duplicates(subset=["A", "B"], keep='first')
df2.to_excel(save_filename, index=False)
CodePudding user response:
You can use two conditions:
df[~df.duplicated(subset="A") | df["A"].eq(2)]
A B
0 1 10
2 2 30
3 2 40
4 3 10