Home > Software design >  How to find a string that match a substring in any order?
How to find a string that match a substring in any order?

Time:05-18

Assuming a list as follows:

list_of_strings = ['foo', 'bar', 'soap', 'seo', 'paseo', 'oes']

and a sub string

to_find = 'eos'

I would like to find the string(s) in the list_of_strings that match the sub string. The output from the list_of_strings should be ['seo', 'paseo', 'oes'] (since it has all the letters in the to_find sub string)

I tried a couple of things:

a = next((string for string in list_of_strings if to_find in string), None) # gives NoneType object as output

&

result = [string for string in list_of_strings if to_find in string] # gives [] as output

but both the codes don't work.

Can someone please tell me what is the mistake I am doing?

Thanks

CodePudding user response:

Your problem logically is comparing the set of characters in the word to find against the set of characters in each word in the list. If the latter word contains all characters in the word to find, then it is a match. Here is one approach using a list comprehension along with set intesection:

list_of_strings = ['foo', 'bar', 'soap', 'seo', 'paseo', 'oes']
to_find = 'eos'
to_find_set = set(list(to_find))
output = [x for x in list_of_strings if len(to_find_set.intersection(set(list(x)))) == len(to_find_set)]
print(output)  # ['seo', 'paseo', 'oes']

If you want to retain an empty string placeholder for any input string which does not match, then use this version:

output = [x if len(to_find_set.intersection(set(list(x)))) == len(to_find_set) else '' for x in list_of_strings]
print(output)  # ['', '', '', 'seo', 'paseo', 'oes']

CodePudding user response:

Do you need the letters of to_find to be next to each other or just all the letters should be in the word? Basically: does seabco match or not?

[Your question does not include this detail and you use "substring" a lot but also "since it has all the letters in the to_find", so I'm not sure how to interpret it.]

If seabco matches, then @Tim Biegeleisen's answer is the correct one. If the letters need to be next to each other (but in any order, of course), then look below:


If the to_find is relatively short, you can just generate all permutations of letters (n! of them, so here (3!) = 6: eos, eso, oes, ose, seo, soe) and check in.

import itertools
list_of_strings = ['foo', 'bar', 'soap', 'seo', 'paseo', 'oes']
to_find = 'eos'

result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]

https://docs.python.org/3/library/itertools.html#itertools.permutations

We do "".join(perm) because perm is a tuple and we need a string.

>>> result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]
>>> result
['seo', 'paseo', 'oes']
  • Related