i need somebody's help, i have a column with words, i want to remove the duplicated words inside each cell
what i want to get is something like this
words | expected |
---|---|
car apple car good | car apple good |
good bad well good | good bad well |
car apple bus food | car apple bus food |
i've tried this but is not working
from collections import OrderedDict
df['expected'] = (df['words'].str.split().apply(lambda x: OrderedDict.fromkeys(x).keys()).str.join(' '))
I'll be very grateful if somebody can help me
CodePudding user response:
If order is important use dict.fromkeys
in a list comprehension:
df['expected'] = [' '.join(dict.fromkeys(w.split())) for w in df['words']]
output:
words expected
0 car apple car good car apple good
1 good bad well good good bad well
2 car apple bus food car apple bus food
CodePudding user response:
If you don't need to retain the original order of the words, you can create an intermediate set which will remove duplicates.
df["expected"] = df["words"].str.split().apply(set).str.join(" ")
CodePudding user response:
if words are string "word1 word2":
df['expected'] = [" ".join(set(wrds.strip().split())) for wrds in df.words]