Home > OS >  Remove duplicate words in the same cell within a column in python
Remove duplicate words in the same cell within a column in python

Time:09-30

i need somebody's help, i have a column with words, i want to remove the duplicated words inside each cell

what i want to get is something like this

words expected
car apple car good car apple good
good bad well good good bad well
car apple bus food car apple bus food

i've tried this but is not working

from collections import OrderedDict


df['expected'] = (df['words'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' '))

I'll be very grateful if somebody can help me

CodePudding user response:

If order is important use dict.fromkeys in a list comprehension:

df['expected'] = [' '.join(dict.fromkeys(w.split())) for w in df['words']]

output:

                words            expected
0  car apple car good      car apple good
1  good bad well good       good bad well
2  car apple bus food  car apple bus food

CodePudding user response:

If you don't need to retain the original order of the words, you can create an intermediate set which will remove duplicates.

df["expected"] = df["words"].str.split().apply(set).str.join(" ")

CodePudding user response:

if words are string "word1 word2":

df['expected'] = [" ".join(set(wrds.strip().split())) for wrds in df.words] 
  • Related