Home > Software design >  how to keep unique or most frequent value(s) per row for a pandas column?
how to keep unique or most frequent value(s) per row for a pandas column?

Time:09-22

I Have a dataframe with a list of words and I was to keep the words unique words if mentioned multiple times or keep all word if only mentioned once.

My dataframe looks like this:

cars 
[honda, toyota]
[honda, none, honda, toyota, toyota]
[lexus, mazda]
[honda, mazda, lexus, mazda, honda]

I want my out to be:

cars
honda, toyota
honda, none, toyota
lexus, mazda
honda, mazda, lexus

Thank you in advance!

CodePudding user response:

Make them into sets, a set inherently only has unique values.

Optionally, you can convert them back to lists again afterwards.

df.cars = df.cars.apply(set)#.apply(list)

CodePudding user response:

If order is important, use dict.fromkeys that acts like an ordered set (python ≥3.6):

df['cars'] = df['cars'].apply(lambda x: list(dict.fromkeys(x)))

Variant with a list comprehension that is potentially more efficient:

df['cars'] = [list(dict.fromkeys(x)) for x in df['cars']]
  • Related