I would to retain the most recent episodes based on the dates. For example, for s001 I would to retain the record with 2022-04-05 since it is more recent than the other one
import pandas as pd
import datetime
record = ['s001', 's002', 's003', 's002', 's004', 's003',
's004', 's001', 's004', 's003', 's002', 's005']
base = datetime.date.today()
date_list = [base - datetime.timedelta(days=x) for x in range(len(record))]
df = pd.DataFrame({
"id": record,
"date_visited": date_list
})
print(df.sort_values('id'))
CodePudding user response:
df.sort_values('date_visited').drop_duplicates(['id'], keep='last')