Home > Software engineering >  Intersection of two Python lists based on condition
Intersection of two Python lists based on condition

Time:10-19

I want to make intersection of these two python lists:

list_1_begin = ["i", "love", "to", "eat", "fresh", "apples", "yeah", "eat", "fresh"]
list_2_find = ["eat", "fresh"]

And my expected result should look like this:

expected result = ["0", "0", "0", "1", "1", "0", "0", "1", "1"]

This can be done by two for loops, but what if I have first list of 10000 elements and second list of 100 elements, also the phrase can repeat multiple times. Is there any Pythonic way?

Important:

For example:

list_1_begin = ["i", "love", "to", "eat", "the", "fresh", "apples", "yeah", "eat", "fresh"]

list_2_find = ["eat", "fresh"]

Solution should look like this:

expected result = ["0", "0", "0", "0", "0", "0", "0", "0", "1", "1"]

So only if all elements from list_2_find are in the list_1_begin in exact order

CodePudding user response:

To keep it pythonic and efficient convert list_2_find to a set and use a list comprehension:

list_1_begin = ["i", "love", "to", "eat", "fresh", "apples", "yeah", "eat", "fresh"]
list_2_find = ["eat", "fresh"]

set_2_find = set(list_2_find)
result = [str(int(e in set_2_find)) for e in list_1_begin]
print(result)

Output

['0', '0', '0', '1', '1', '0', '0', '1', '1']

As an alternative for formatting a bool as an int, one approach is to use an f-string as follows:

result = [f"{(e in set_2_find):d}" for e in list_1_begin]

Output

['0', '0', '0', '1', '1', '0', '0', '1', '1']

Some additional info on f-string formatting can be found here.

UPDATE

If the matches must be sequential, use:

from itertools import chain

list_1_begin = ["i", "love", "to", "eat", "the", "fresh", "apples", "yeah", "eat", "fresh"]
list_2_find = ["eat", "fresh"]

len_1 = len(list_1_begin)
len_2 = len(list_2_find)

pos = chain.from_iterable([range(e, e   len_2) for e in range(len_1) if list_1_begin[e:e   len_2] == list_2_find])
positions_set = set(pos)

result = [f"{(i in positions_set):d}" for i in range(len_1)]
print(result)

Output

['0', '0', '0', '0', '0', '0', '0', '0', '1', '1']
  • Related