Change boolean value to True for duplicates with more distant/far date pandas-CodePudding

Given dataframe I want to set isActive column value to True only duplicated value and add '_duplicate' to the Name column.

df = 

Name    isActive    LoginDate

John    False       2021      
John    False       2022 
Fred    False       2020

Desired output is:

df =

Name              isActive    LoginDate

John_duplicate    True        2021      
John              False       2022 
Fred              False       2020

For now I was able to add numbers to each duplicates, but I want to skip with nearest login date and add text to oldest. And change boolean value:

df.LoginDate = ad.groupby('LoginDate').LoginDate.apply(lambda n: n   (np.arange(len(n)) 1).astype(str))

Any suggestion?

CodePudding user response：

Use Series.duplicated for first value per Name with chaining duplicated with keep=False for first duplicated Name and set column isActive with append substring to Name:

m = ~df['Name'].duplicated() & df['Name'].duplicated(keep=False)
df['isActive'] = m
df.loc[m, 'Name']  =  '_duplicate'

print (df)
             Name  isActive  LoginDate
0  John_duplicate      True       2021
1            John     False       2022
2            Fred     False       2020