Home > OS >  How to delete duplicated clients based on value in datetime column in Data Frame in Python Pandas?
How to delete duplicated clients based on value in datetime column in Data Frame in Python Pandas?

Time:06-01

I have DataFrame in Python Pandas like below:

date_col   | ID  | Phone
-----------|-----|--------
2020-05-17 | 111 | Apple
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony

As you can see ID "111" is duplicated and I need to do that when ID is duplicated I need to take row with the latest date from column "date_col" (this col is in format datetime64). So as a result I need something like below becase ID "111" is duplicated but date 2020-06-11 is higher than 2020-05-17:

date_col   | ID  | Phone
-----------|-----|--------
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony

How can I do that in Python Pandas ?

CodePudding user response:

Try:

df = df.sort_values(by="date_col").drop_duplicates(subset="ID", keep="last")
print(df)

Prints:

    date_col   ID Phone
1 2020-06-11  111  Sony
2 2021-12-28  222  Sony
  • Related