Home > Net >  Pandas dataframe delete duplicate base date column
Pandas dataframe delete duplicate base date column

Time:09-07

I have 2 datafames with same columns that one of the column is date.

I try to concat the dataframes and delete the row with the earlier date, when the primary keys are same.

Input (df1 & df2):

pk1 | pk2 |  C  |   DATE  
 1  |  2  |  3  | 05-09-22
 2  |  3  |  4  | 05-09-22


pk1 | pk2 |  C  |   DATE  
 1  |  2  |  5  | 06-09-22

Output:

pk1 | pk2 |  C  |   DATE  
 2  |  3  |  4  | 05-09-22
 1  |  2  |  5  | 06-09-22

CodePudding user response:

You need to drop_duplicates while keeping the first.

df = pd.concat([df1,df2]) # concating
df.sort_values(by=['DATE'], ascending=True, inplace=True) # sorting by date
df = df.drop_duplicates(subset=['pk1', pk2], keep='first') # dropping duplicates
  • Related