Python - efficiently check a list exists AND element exists in list-CodePudding

I have a variable foo, which points to a string, "bar"

foo = "bar"

I have a list, called whitelist. If whitelist is not empty, the elements contained are a whitelist. If whitelist is empty, then the if statement permits any string.

I have implemented this as follows

whitelist = ["bar", "baz", "x", "y"]

if whitelist and foo in whitelist:
    print("bar is whitelisted")
    # do something with whitelisted element

if whitelist, by my understanding, checks if whitelist returns True. whitelist will be False if whitelist is empty. If whitelist contains elements, it will return True.

However, the real implementation of this contains:

lots of strings to check e.g. `"bar", "baz", "x", "y", "a", "b"
lots of whitelists to check against

Therefore, I was wondering if there is a more computationally efficient way of writing the if statement. It seems like checking the existence of whitelist each time is inefficient, and could be simplified.

CodePudding user response：

These are some ways to check whether an element is in a list or not.

from timeit import timeit
import numpy as np




whitelist1 = {"bar", "baz", "x", "y"}
whitelist2 = np.array(["bar", "baz", "x", "y"])

def func1():
    return {"foo"}.intersection(whitelist1)

def func2():
    return "foo" in whitelist1

def func3():
    return np.isin('foo',whitelist1)


def func4():
    return whitelist2[np.searchsorted(whitelist2, 'foo')] == 'foo'




print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))

Time Taken by each function

func1= 0.01365450001321733
func2= 0.005112499929964542
func3 0.5342871999600902
func4= 0.17057700001168996

FOr randomly generated list

from timeit import timeit
import numpy as np
import random as rn
from string import ascii_letters


# randomLst = for a in range(500) rn.choices(ascii_letters,k=5)

randomLst = []
while len(randomLst) !=1000:
    radomWord = ''.join(rn.choices(ascii_letters,k=5))
    if radomWord not in randomLst:
        randomLst.append(radomWord)


whitelist1 = {*randomLst}
whitelist2 = np.array(randomLst)
randomWord = rn.choice(randomLst)
randomWords = set(rn.choices(randomLst, k=100))


def func1():
    return {randomWord}.intersection(whitelist1)

def func2():
    return randomWord in whitelist1

def func3():
    return np.isin('foo',whitelist1)


def func4():
    return whitelist2[np.searchsorted(whitelist2, randomWord)] == randomWord


def func5():
    return randomWords & whitelist1

print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
print("func5=",timeit(func5,number=1000)) # Here I change the number to 1000 because we check the 100 randoms word at one so number = 100000/100 = 1000.

Time taken

func1= 0.012835499946959317
func2= 0.005004600039683282
func3 0.5219665999757126
func4= 0.19900090002920479
func5= 0.0019264000002294779

Conclusion

If you want to check only one word then 'in' statement is fast
But, if you have a list of word then '&' statement is fast 'func5'

Note: function 5 returns a set with the words that are in the whitelist

CodePudding user response：

whitelist would exist, but if it's possible None coerce with:

whitelist = whitelist or []

As shared above then you can just foo in whitelist to figure out if it's in the list. This is O(len(whitelist)) operation. Arrays are surprisingly fast (say, for at least len(whitelist) >= 1,000) in practice.

If you need it to be faster use a set, and optionally if you need to do n lookup collect your foos into a set then use intersect for O(n):

foos = { 'bar', 'none' }
whitelist = { 'bar' }
for foo in foos & whitelist:
   print(foo)

CodePudding user response：

Here is the simplified solution, You can do that with two methods

whitelist = ["bar", "baz", "x", "y"]
foo = "bar"
# method 1
def WhiteListExists(foo, whitelist):
    if whitelist and foo in whitelist:
        return True
    else:
        return False

exists = WhiteListExists(foo,whitelist)

# method 2
exists = True if whitelist and foo in whitelist else False

Both methods do the same but the second one is fast.