Home > Blockchain >  Need to update duplicate with existing rows
Need to update duplicate with existing rows

Time:02-16

I need to update(clean) my data by following conditions

  1. Check if PS-id has duplicate values.

  2. Check which has the latest Date

  3. Update the old row which has the latest Date and new Cam with the old Cam

    ID       Cam                PS-id             Date
    A1       Alto-Car           A1222             1.1.2022     
    B1       BMW-Car            A123              1.1.2022
    A5       Car-Alto           A1222             5.1.2022    
    

The ideal output should be like this

ID       Cam                PS-id             Date
A1       Car-Alto           A1222             5.1.2022     
B1       BMW-Car            A123              1.1.2022

I have tried few solutions but cant do based on my conditions

CodePudding user response:

Convert values to datetimes in column Date, then sorting per both columns, so first row is old rows created by DataFrame.drop_duplicates and converting to Series used for reassign ID by rows with maximal vaues per PS-id:

df['Date'] = pd.to_datetime(df['Date'], format='%m.%d.%Y')
df = df.sort_values(['PS-id','Date'])

s = df.drop_duplicates('PS-id').set_index('PS-id')['ID']
df = df.drop_duplicates('PS-id', keep='last').assign(ID = lambda x: x['PS-id'].map(s))
print (df)
   ID       Cam  PS-id       Date
2  A1  Car-Alto  A1222 2022-05-01
1  B1   BMW-Car   A123 2022-01-01

CodePudding user response:

IIUC, you want a combination of groupby idxmax and groupby first, that you merge:

# get indices of latest (max) dates
# here we keep the string format
# if you want to convert to datetime
# df['Date'] = pd.to_datetime(df['Date'])
idx = pd.to_datetime(df['Date']).groupby(df['PS-id']).idxmax().values

# slice the rows with max dates
(df.loc[idx]
   # and merge back first id
   .merge(df.groupby('PS-id')['ID'].first(),
          left_on='PS-id', right_index=True, suffixes=('_', ''))
   .drop('ID_', axis=1)[df.columns]
)

output:

   ID       Cam  PS-id      Date
2  A1  Car-Alto  A1222  5.1.2022
1  B1   BMW-Car   A123  1.1.2022
  • Related