Home > Software engineering >  How to remove the duplicates from the Data frame only if that occur with in 5 days
How to remove the duplicates from the Data frame only if that occur with in 5 days

Time:03-30

I need to remove the duplicates from a dataframe grouping by id and sub-id that occurred within 5 days from its previous occurrence.

input:

enter image description here

output:

enter image description here

CodePudding user response:

Use DataFrameGroupBy.diff for compare datetimes per both columns and remove if less like 5 days:

df['Date'] = pd.to_datetime(df['Date'])

df1 = df[~df.groupby(['id','sub_id'])['Date'].diff().dt.days.lt(5)]
  • Related