Home > Blockchain >  Group by date range and check duplicate pandas
Group by date range and check duplicate pandas

Time:03-14

I am trying to see if I can find duplicate records happened in the year

I know how to find when it is just the year but the date range and find the duplicate and remove the duplicate seems a bit tough. I am beginner so pleas can anyone help ?

CodePudding user response:

You can use:

out = df[~df.loc[df['Date'].str.contains('2021')].duplicated(keep=False)
            .reindex(df.index, fill_value=False)]
print(out)

# Output
         Date Transaction
0  31/12/2020   PURCHASES
1  31/12/2020       Sales
2  31/12/2020       Sales

CodePudding user response:

Use GroupBy.cumcount grouping by year and Transaction

year_s = pd.to_datetime(df['Date']).dt.year
df[~(df.groupby([year_s, 'Transaction']).cumcount().gt(0) & year_s.eq(2021))]

         Date Transaction
0  31/12/2020   PURCHASES
1  31/12/2020       Sales
2  31/12/2020       Sales
3  04/01/2021   PURCHASES
4  04/01/2021       Sales

If you don't want check dups by year:

df[~(df['Transaction'].duplicated() & year_s.eq(2021))]
  • Related