I have a dataframe
like this:
df3 = pd.DataFrame({'ID': ['Stay home, T5006, T5006, Stay home', 'Go for walk, T5007, T5007, Go for walk'],
'Name': ['Stay home, Go for walk, Stay home', 'Go outside, Go outside, Go outside']
})
ID Name
0 Stay home, T5006, T5006, Stay home Stay home, Go for walk, Stay home
1 Go for walk, T5007, T5007, Go for walk Go outside, Go outside, Go outside
I want to delete the dulicates from ID
column. Expected outcome:
ID Name
0 Stay home,T5006 Stay home, Go for walk, Stay home
1 Go for walk,T5007 Go outside, Go outside, Go outside
Any ideas?
CodePudding user response:
Use dict.fromkey
trick for remove duplicates of splitted values, then join by ,
in lambda function:
df3['ID'] = df3['ID'].apply(lambda x: ', '.join(dict.fromkeys(x.split(', '))))
Or use list comprehension:
df3['ID'] = [', '.join(dict.fromkeys(x.split(', '))) for x in df3['ID']]
print (df3)
ID Name
0 Stay home, T5006 Stay home, Go for walk, Stay home
1 Go for walk, T5007 Go outside, Go outside, Go outside
Of if possible order is not important use set
s:
df3['ID'] = df3['ID'].apply(lambda x: ', '.join(set(x.split(', '))))
df3['ID'] = [', '.join(set(x.split(', '))) for x in df3['ID']]
print (df3)
ID Name
0 Stay home, T5006 Stay home, Go for walk, Stay home
1 T5007, Go for walk Go outside, Go outside, Go outside