Home > Software engineering >  Check if a string is contained within the string elements of an array
Check if a string is contained within the string elements of an array

Time:04-24

We can replace:

if "ab" == "ab" or "ab" == "ac" or "ab" == "ad": # ...

With this:

if "ab" in ("ab", "ac", "ad"): # ...

All's well and good.

But now, if we change the equality (==) operator with the membership (in) operator, we get:

if "ab" in "aba" or "ab" in "ac" or "ab" in "ad": #...

Is there a better solution to achieve this, without using a for loop (and this many or operators)? I know I could do something like this:

if any(search_string in x for x in ("aba", "ac", "ad")): # ...

But is there a simpler way to accomplish this objective, without using a for clause?

CodePudding user response:

There is no built-in type in Python that lets you search through a collection to test for substrings, no. What you have is arguably the simplest implementation of a substring search. There is nothing wrong with using any() and a generator expression here.

Don't get hung up by the fact that you are using the same in operator here. It is the container on the right-hand side of the in operator that implements the actual search, not in itself (apart from a fallback for legacy objects that don't have a __contains__ or __iter__ implementation). Python's tuple, list, dict, set and frozenset types all define a __contains__ implementation that lets you search for something inside the collection that is equal to the left-hand side operand, which is what makes if "ab" in ("ab", "ac", "ad"): possible.

If you want to have the same 'clean' expression for finding matches for substring containment, implement your own:

from collections import Container

class SubstringContainer(Container):
    def __init__(self, *values):
        self._values = values

    def __contains__(self, needle):
        return any(needle in value for value in self.values)

and you could then use:

if "ab" in SubstringContainer("aba", "ac", "ad"):

You might even be able to optimise the __contains__ implementation where you pre-compute some kind of index, who knows. Personally, I'd not bother, unless there was a significant performance boost from a more involved algorithm encapsulated by such a class.

  • Related