Right now, I am working with this dataframe..
Name | DateSolved | Points |
---|---|---|
Jimmy | 12/3 | 100 |
Tim | 12/4 | 50 |
Jo | 12/5 | 25 |
Jonny | 12/5 | 25 |
Jimmy | 12/8 | 10 |
Tim | 12/8 | 10 |
At this moment, if there are duplicate names in the dataset, I just drop the oldest one (by date) from the dataframe by utilizing df.sort_values('DateSolved').drop_duplicates('Name', keep='last')
leading to a dataset like this
Name | DateSolved | Points |
---|---|---|
Jo | 12/5 | 25 |
Jonny | 12/5 | 25 |
Jimmy | 12/8 | 10 |
Tim | 12/8 | 10 |
However, instead of dropping the oldest one, I wish to keep it but give it a 50% points reduction. Something like this
Name | DateSolved | Points |
---|---|---|
Jimmy | 12/3 | 50 (-50%) |
Tim | 12/4 | 25 (-50%) |
Jo | 12/5 | 25 |
Jonny | 12/5 | 25 |
Jimmy | 12/8 | 10 |
Tim | 12/8 | 10 |
How could I go about doing this? I cannot find a way to both FIND the duplicates based on "Name" and then change the value of the "POINTS" column in the same row.
Thank you!
CodePudding user response:
IIUC use DataFrame.duplicated
for select all duplicates withot last, select column Points
and divide by 2
:
df.loc[df.duplicated('Name', keep='last'), 'Points'] /= 2
print (df)
Name DateSolved Points
0 Jimmy 12/3 50.0
1 Tim 12/4 25.0
2 Jo 12/5 25.0
3 Jonny 12/5 25.0
4 Jimmy 12/8 10.0
5 Tim 12/8 10.0