I need to remove the duplicates from a dataframe grouping by id and sub-id that occurred within 5 days from its previous occurrence.
input:
output:
CodePudding user response:
Use DataFrameGroupBy.diff
for compare datetimes per both columns and remove if less like 5
days:
df['Date'] = pd.to_datetime(df['Date'])
df1 = df[~df.groupby(['id','sub_id'])['Date'].diff().dt.days.lt(5)]