Remove the similar Duplicates from list of strings-CodePudding

I'm trying to remove the similar duplicates from my list. Here is my code:

l = ["shirt", "shirt", "shirt len", "pant", "pant cotton", "len pant", "watch"]

res = [*set(l)]
print(res)

This will Remove only shirt word which is actually duplicate, but I'm looking to remove the similar words to remove like shirt Len,pant cotton,Len pant. Like that.

Expecting Output as Shirt,pant,watch

CodePudding user response：

It sounds like you want to check if the single-word strings are in any other string, and if so remove them as a duplicate. I would go about it this way:

Separate the list into single-word strings and any other string.
For each longer string, check if any of the single-word strings is contained in it.
- If so, remove it. Otherwise, add it to the result.
Finally, add all the single-word strings to the result.

l = ["shirt", "shirt", "shirt len", "pant", "pant cotton", "len pant", "watch"]

single, longer = set(), set()
for s in l:
    if len(s.split()) == 1:
        single.add(s)
    else:
        longer.add(s)

res = set()
for s in longer:
    if not any(word in s for word in single):
        res.add(s)
res |= single

print(res)

This example will give:

{'shirt', 'watch', 'pant'}

CodePudding user response：

You can try something like below:

by selecting single word element from list and then apply set

set([ls for ls in lst if ' 'not in ls]) #Output {'pant', 'shirt', 'watch'}