Given dataframe I want to set isActive
column value to True
only duplicated value and add '_duplicate' to the Name
column.
df =
Name isActive LoginDate
John False 2021
John False 2022
Fred False 2020
Desired output is:
df =
Name isActive LoginDate
John_duplicate True 2021
John False 2022
Fred False 2020
For now I was able to add numbers to each duplicates, but I want to skip with nearest login date and add text to oldest. And change boolean value:
df.LoginDate = ad.groupby('LoginDate').LoginDate.apply(lambda n: n (np.arange(len(n)) 1).astype(str))
Any suggestion?
CodePudding user response:
Use Series.duplicated
for first value per Name
with chaining duplicated
with keep=False
for first duplicated Name
and set column isActive
with append substring to Name
:
m = ~df['Name'].duplicated() & df['Name'].duplicated(keep=False)
df['isActive'] = m
df.loc[m, 'Name'] = '_duplicate'
print (df)
Name isActive LoginDate
0 John_duplicate True 2021
1 John False 2022
2 Fred False 2020