Home > OS >  fastest way to check if atleast one element in set/list is in each element in a collection of lists/
fastest way to check if atleast one element in set/list is in each element in a collection of lists/

Time:04-14

I have the following:

list1 = {"a", "b", "c"}

list2 = [
    {"a", "s", "d", "f"},
    {"q", "w", "e", "c"},
    {"v", "b", "n", "m"},
]

i now want to check that elements in list1 are somewhere in list2. each element in list2 MUST contain one of the elements in list1.

what i currently do is the following (also found it on stackoverflow a while ago):

all(list1 & l for l in list2)

this is actually reasonably fast. however I am now running into the issue that I have billions of different list1, so I have to come up with a faster solution. I also tried numba, but I am struggling with nested lists, and sets are not supported.

I have a bunch of items (like the sets in list2) that can represent that sets. for example, the first set in list2 consists of "a", "s", "d" and "f". all of those characters "desribe" the first set in list2.

what I now want to do is find the shortest combination to describe list2. for example:

list2 = [
    {"a", "s", "d", "f"},
    {"q", "w", "e", "c"},
    {"v", "b", "n", "m"},
    {"v", "l", "p", "o"},
]

here the shortest combination to describe list2 is a,q,v (a describes the first element, q the second and v elements 3 and 4)

the way i construct list1 would be to take

U = set.union(*list2)

for list1 in itertools.combinations(U,3): #i loop over the combinations to find the minimum one, so combinations(U,2), combinations(U,3) ....
     ...

this works really well, even for very large numbers (100s of millions of combinations), however it is still somewhat limited. I would like to reduce it as much as I can. edit: the datastructure for list2 is as desribed above, a collection of sets containing strings (in my case its 3 character combinations), and so list1 is also a set of strings.

thanks

CodePudding user response:

There is a simple optimization you can make,

not any(map(list1.isdisjoint, list2))

isdisjoint avoids needing to calculate the full result, and map is faster than a comprehension when you are just calling a single method.

However, if you want a more optimal result you have to give more detail about what you are trying to do. Particularly, what are the sizes of all of the data structures, and what are the elements they contain?

CodePudding user response:

what I now want to do is find the shortest combination to describe list2

This is the Hitting Set Problem, which is well studied and for which there exist multiple solvers, like this one.

  • Related