I have DataFrame in Python Pandas like below:
date_col | ID | Phone
-----------|-----|--------
2020-05-17 | 111 | Apple
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony
As you can see ID "111" is duplicated and I need to do that when ID is duplicated I need to take row with the latest date from column "date_col" (this col is in format datetime64). So as a result I need something like below becase ID "111" is duplicated but date 2020-06-11 is higher than 2020-05-17:
date_col | ID | Phone
-----------|-----|--------
2020-06-11 | 111 | Sony
2021-12-28 | 222 | Sony
How can I do that in Python Pandas ?
CodePudding user response:
Try:
df = df.sort_values(by="date_col").drop_duplicates(subset="ID", keep="last")
print(df)
Prints:
date_col ID Phone
1 2020-06-11 111 Sony
2 2021-12-28 222 Sony