I have two lists:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
I want to count how many times the lookup_list
appeared in my_list
with the following logic:
- The order should be 1 -> 2 -> 3
- In
my_list
, thelookup_list
items doesn't have to be next to each other: 1,4,2,1,5,3 -> should generate a match since there is a2
comes after a1
and a3
comes after2
.
The mathces based on the logic:
1st match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
2nd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
3rd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
4th match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
The lookup_list
is dynamic, it could be defined as [1,2]
or [1,2,3,4]
, etc. How can I solve it? All the answers I've found is about finding matches where 1,2,3
appears next to each other in an ordered way like this one: Find matching sequence of items in a list
I can find the count of consecutive sequences with the below code but it doesn't count the nonconsecutive sequences:
from nltk import ngrams
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
all_counts = Counter(ngrams(l2, len(l1)))
counts = {k: all_counts[k] for k in [tuple(lookup_list)]}
counts
>>> {(1, 2, 3): 2}
I tried using pandas rolling window functions but they don't have a custom reset option.
CodePudding user response:
The function find_matches()
returns indices where the matches from lookup_list
are:
def find_matches(lookup_list, lst):
buckets = []
def _find_bucket(i, v):
for b in buckets:
if lst[b[-1]] == lookup_list[len(b) - 1] and v == lookup_list[len(b)]:
b.append(i)
if len(b) == len(lookup_list):
buckets.remove(b)
return b
break
else:
if v == lookup_list[0]:
buckets.append([i])
rv = []
for i, v in enumerate(my_list):
b = _find_bucket(i, v)
if b:
rv.append(b)
return rv
lookup_list = [1, 2, 3]
my_list = [1, 2, 3, 4, 5, 2, 1, 2, 2, 1, 2, 3, 4, 5, 1, 3, 2, 3, 1]
print(find_matches(lookup_list, my_list))
Prints:
[[0, 1, 2], [6, 7, 11], [9, 10, 15], [14, 16, 17]]
CodePudding user response:
def find_all_sequences(source, sequence):
def find_sequence(source, sequence, index):
for i in sequence:
index = source.index(i, index 1)
yield index, i
first, *rest = sequence
index = -1
while True:
try:
index = source.index(first, index 1)
yield (index, first), *find_sequence(source, rest, index)
except ValueError:
break
Usage:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
print(*find_all_sequences(my_list, lookup_list), sep="\n")
Output:
((0, 1), (1, 2), (2, 3))
((6, 1), (7, 2), (11, 3))
((9, 1), (10, 2), (11, 3))
((14, 1), (16, 2), (17, 3))
Generator function find_all_sequences()
yields tuples of index-value pairs. In this function we initialize loop which will be stopped when list.index()
call will throw ValueError
. Internal generator function find_sequence()
yield pairs of index and sequence item.
According to this benchmark, my method is about twice faster than one from Andrej Kesely's answer.
CodePudding user response:
Here is a recursive solution:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
def find(my_list, continue_from_index):
if continue_from_index > (len(my_list) - 1):
return 0
last_found_index = 0
found_indizes = []
first_occuring_index = 0
found = False
for l in lookup_list:
for m_index in range(continue_from_index, len(my_list)):
if my_list[m_index] is l and m_index >= last_found_index:
if not found:
found = True
first_occuring_index = m_index
last_found_index = m_index
found = 1
found_indizes.append(str(m_index))
break
if len(found_indizes) is len(lookup_list):
return find(my_list, first_occuring_index 1) 1
return 0
print(find(my_list, 0))
CodePudding user response:
my_list = [5, 6, 3, 8, 2, 1, 7, 1]
lookup_list = [8, 2, 7]
counter =0
result =False
for i in my_list:
if i in lookup_list:
counter =1
if(counter==len(lookup_list)):
result=True
print (result)