I have different lists of boolean values in different lengths which I will call blists
, which I want to compare to a fixed boolean list over time, which I will call blist
.
The goal is to find the longest matching series in a list (of blists
) in blist
.
I want to see if a part or even a whole list of blists
can be found somewhere in the list of blist
. I would set a minimum of matching values to not overfill my output.
For example (True = T, False = F, example shorter than in real life):
List 1 of blists: (T,T,T,F,T,T,F,F,F,T)
blist: (F,F,T,T,F,F,F,F)
I want to see if a part of list 1 (F,T,T,F,F,F)
equals to some part of the list blist
.
So for an example blist of (F,F,T,T,F,F,F,F)
the output should be, that a part of list 1 can be found in blist
.
So output for example:
blist
has a similarity with list 1 ((starting point in list 1) 3, (starting point in blist
) 1, (length) 6, (part of list that matches): (F,T,T,F,F,F)
I tried .corr()
etc., for
-loops and if
-conditions and nothing wanted to work properly.
This problem probably has a simple solution, but I can't figure it out.
CodePudding user response:
If you want an implementation that determines where exactly the common list starts at each list, here is a simple "for loop" implementation (I used "0,1" instead of "True and False" for sake of readability. you can convert it easily):
a = [0,1,0,1,1,0,0,1,0,1,1,1,0,1,1,1,0,0]
b = [1,1,1,1,1,0,1,1,1,0,0,0,1,1]
list1_start = 0
list2_start = 0
common_list = []
for i in range(len(a)):
for j in range(len(a)-i):
for i2 in range(len(b)):
for j2 in range(len(b)-i2):
if a[i:i j] == b[i2:i2 j2]:
if len(common_list)<len(a[i:i j]):
common_list = a[i:i j]
list1_start = i 1
list2_start = i2 1
print("list1 start: ", list1_start)
print("list2 start: ", list2_start)
print("common_list: ",common_list)
the answer:
list1 start: 10
list2 start: 3
common_list: [1, 1, 1, 0, 1, 1, 1, 0]
CodePudding user response:
I would suggest using something like the KMP algorithm to match subset lists in the whole list. It would utilize O(n) complexity. Making a full implementation use worst of O(n^3) but most likely O(n^2). Solution could look like this:
longest_match = []
for i in range(len(blists)):
if(i len(longest_match) >= len(blists):
break
for j in range(i len(longest_match) 1, len(blists)):
if(KMPSearch(blists[i,j], blist)):
longest_match = blists[i,j]
else:
break
Where you loop through each position in blists and check a subset list starting at i and going one past the length of the longest match already found.
CodePudding user response:
one way would be to join the lists into strings and use the "in" and "find" string methods and loops:
l1 = ['T','T','T','F','T','T','F','F','F','T']
blist = ['F','F','T','T','F','F','F','F']
w1, bw = ''.join(l1), ''.join(blist)
all_options = sum([[w1[i:ii] for i in range(len(w1))] for ii in range(len(w1))], [])
all_valid_options = [x for x in all_options if (len(x)>1 and x in bw)]
longest_option = sorted(all_valid_options, key=len)[-1]
w1_start = w1.find(longest_option)
w1_end = w1_start len(longest_option)
bw_start = bw.find(longest_option)
bw_end = bw_start len(longest_option)
print(longest_option, len(longest_option), (w1_start, w1_end), (bw_start, bw_end))
FTTFFF 6 (3, 9) (1, 7)