Home > Blockchain >  How can i find a phrase duplicates in list?
How can i find a phrase duplicates in list?

Time:06-29

there is a list like that:

my_list = ['beautiful moments','moments beautiful']

don`t look at grammar, the main idea is that those two strings are about same thing.

The question is how to detect that those phrases are duplicate WITHOUT splitting and sorting each phrase?

CodePudding user response:

You can take advantage of frozensets here because they are hashable(They can be added to the set - Time complexity of membership testing for sets is O(1)) and have equality comparison of sets(Two sets are equal if they have the same items in any order).

Basically we iterate through the items of the list, split them and make frozenset out of them. There is a unique set that we check to see if our item is present there or not.

my_list = ["beautiful moments", "moments beautiful", "hi bye", "hi hi", "bye hi"]

unique = set()
result = []

for i in my_list:
    f = frozenset(i.split())
    if f not in unique:
        unique.add(f)
        result.append(i)

print(result)

ourput:

['beautiful moments', 'hi bye', 'hi hi']
  • Related