Home > Mobile >  Modify Rows With Duplicate Values in a Python Pandas Dataframe
Modify Rows With Duplicate Values in a Python Pandas Dataframe

Time:12-10

Right now, I am working with this dataframe..

Name DateSolved Points
Jimmy 12/3 100
Tim 12/4 50
Jo 12/5 25
Jonny 12/5 25
Jimmy 12/8 10
Tim 12/8 10

At this moment, if there are duplicate names in the dataset, I just drop the oldest one (by date) from the dataframe by utilizing df.sort_values('DateSolved').drop_duplicates('Name', keep='last') leading to a dataset like this

Name DateSolved Points
Jo 12/5 25
Jonny 12/5 25
Jimmy 12/8 10
Tim 12/8 10

However, instead of dropping the oldest one, I wish to keep it but give it a 50% points reduction. Something like this

Name DateSolved Points
Jimmy 12/3 50 (-50%)
Tim 12/4 25 (-50%)
Jo 12/5 25
Jonny 12/5 25
Jimmy 12/8 10
Tim 12/8 10

How could I go about doing this? I cannot find a way to both FIND the duplicates based on "Name" and then change the value of the "POINTS" column in the same row.

Thank you!

CodePudding user response:

IIUC use DataFrame.duplicated for select all duplicates withot last, select column Points and divide by 2:

df.loc[df.duplicated('Name', keep='last'), 'Points'] /= 2
print (df)
    Name DateSolved  Points
0  Jimmy       12/3    50.0
1    Tim       12/4    25.0
2     Jo       12/5    25.0
3  Jonny       12/5    25.0
4  Jimmy       12/8    10.0
5    Tim       12/8    10.0
  • Related