Home > Software design >  delete duplicates words from column
delete duplicates words from column

Time:10-05

I have a dataframe like this:

df3 = pd.DataFrame({'ID': ['Stay home, T5006, T5006, Stay home', 'Go for walk, T5007, T5007, Go for walk'],
                    'Name': ['Stay home, Go for walk,  Stay home', 'Go outside, Go outside, Go outside']
                    })


    ID                                      Name
0   Stay home, T5006, T5006, Stay home      Stay home, Go for walk, Stay home
1   Go for walk, T5007, T5007, Go for walk  Go outside, Go outside, Go outside

I want to delete the dulicates from ID column. Expected outcome:

    ID                  Name
0   Stay home,T5006     Stay home,  Go for walk, Stay home
1   Go for walk,T5007   Go outside, Go outside,  Go outside

Any ideas?

CodePudding user response:

Use dict.fromkey trick for remove duplicates of splitted values, then join by , in lambda function:

df3['ID'] = df3['ID'].apply(lambda x: ', '.join(dict.fromkeys(x.split(', '))))

Or use list comprehension:

df3['ID'] = [', '.join(dict.fromkeys(x.split(', '))) for x in df3['ID']]

print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  Go for walk, T5007  Go outside, Go outside, Go outside

Of if possible order is not important use sets:

df3['ID'] = df3['ID'].apply(lambda x: ', '.join(set(x.split(', '))))
df3['ID'] = [', '.join(set(x.split(', '))) for x in df3['ID']]
print (df3)
                   ID                                Name
0    Stay home, T5006  Stay home, Go for walk,  Stay home
1  T5007, Go for walk  Go outside, Go outside, Go outside
  • Related