I have 2 datafames with same columns that one of the column is date.
I try to concat the dataframes and delete the row with the earlier date, when the primary keys are same.
Input (df1 & df2):
pk1 | pk2 | C | DATE
1 | 2 | 3 | 05-09-22
2 | 3 | 4 | 05-09-22
pk1 | pk2 | C | DATE
1 | 2 | 5 | 06-09-22
Output:
pk1 | pk2 | C | DATE
2 | 3 | 4 | 05-09-22
1 | 2 | 5 | 06-09-22
CodePudding user response:
You need to drop_duplicates while keeping the first.
df = pd.concat([df1,df2]) # concating
df.sort_values(by=['DATE'], ascending=True, inplace=True) # sorting by date
df = df.drop_duplicates(subset=['pk1', pk2], keep='first') # dropping duplicates