I would like to change the repeated comments with word "same" but keep the original ones and change the ID like below
df = {'Key': ['111', '111','111', '222*1','222*2', '333*1','333*2', '333*3'],
'id' : ['', '','', '1','2', '1','2', '3'],
'comment': ['wrong sentence', 'wrong sentence','wrong sentence', 'M','M', 'F','F', 'F']}
# Create DataFrame
df = pd.DataFrame(df)
print(df)
Input :
the desired output :
CodePudding user response:
The exact logic is unclear, but you can try:
# replace duplicated words per group
df.loc[df[['Key', 'comment']].duplicated(), 'comment'] = 'same'
# update id/Key
m = df['id'].eq('')
df.loc[m, 'id'] = df.groupby('Key').cumcount().add(1)
df.loc[m, 'Key'] = '*' df['id'].astype(str)
Output:
Key id comment
0 111*1 1 wrong sentence
1 111*2 2 same
2 111*3 3 same
3 222*1 1 M
4 222*2 2 M
5 333*1 1 F
6 333*2 2 F
7 333*3 3 F