I am trying to see if I can find duplicate records happened in the year
I know how to find when it is just the year but the date range and find the duplicate and remove the duplicate seems a bit tough. I am beginner so pleas can anyone help ?
CodePudding user response:
You can use:
out = df[~df.loc[df['Date'].str.contains('2021')].duplicated(keep=False)
.reindex(df.index, fill_value=False)]
print(out)
# Output
Date Transaction
0 31/12/2020 PURCHASES
1 31/12/2020 Sales
2 31/12/2020 Sales
CodePudding user response:
Use GroupBy.cumcount
grouping by year and Transaction
year_s = pd.to_datetime(df['Date']).dt.year
df[~(df.groupby([year_s, 'Transaction']).cumcount().gt(0) & year_s.eq(2021))]
Date Transaction
0 31/12/2020 PURCHASES
1 31/12/2020 Sales
2 31/12/2020 Sales
3 04/01/2021 PURCHASES
4 04/01/2021 Sales
If you don't want check dups by year:
df[~(df['Transaction'].duplicated() & year_s.eq(2021))]