What are more efficient ways of doing a "if anything else but this set of items" statement-CodePudding

For instance if we're given a string: "abcd"

If there's anything but "a" and "c" in a string, I want to reject it, so I'd do:

s = "abcd"
i_only_want = ["a", "c"]
for letter in s:
   if letter not in i_only_want:
      reject(s)

Is there a better/more efficient way than doing a nested loop?

CodePudding user response：

The developers of (the standard version of) Python use C, which means they have a very efficient language on their hands - most basic operations are written in very efficient code, so using them when they are available, instead of coming up with your own logic in Python is almost always faster, unless the overhead of a function call beats it.

Since you need to avoid checking duplicates and you're only interested in knowing if there are any characters in s that are not in i_only_want, you're basically checking if the characters of s are a subset of the characters in i_only_want.

So this is likely close to optimal:

s = 'abcd'
i_only_want = {'a', 'c'}
if not set(s).issubset(i_only_want):
    reject(s)

Whether it is really more efficient may depend on the size of the string and collection of characters, as well as several other factors. But as others have pointed out, the problem as you present it is so minimal, that optimisation is hardly even a concern.

If you're doing this billions upon billions of time, sure, but then you should probably tell use more about how these billions of values arrive, what they typically look like, how they are distributed.

By the way, I'd prefer set(s).issubset(i_only_want) over set(s).difference(i_only_want) - that relies on the truthiness of the non-empty set, but it requires going through the whole set to come up with a complete difference, while you're only interesting in known if there is a difference. So it's likely less efficient, since issubset is likely to evaluate the sets lazily.