Do two boolean colums/lists match? Comparison of two different sized colums: Does a part of one list-CodePudding

I have different lists of boolean values in different lengths which I will call blists, which I want to compare to a fixed boolean list over time, which I will call blist.

The goal is to find the longest matching series in a list (of blists) in blist.

I want to see if a part or even a whole list of blists can be found somewhere in the list of blist. I would set a minimum of matching values to not overfill my output.

For example (True = T, False = F, example shorter than in real life):

List 1 of blists: (T,T,T,F,T,T,F,F,F,T)

blist: (F,F,T,T,F,F,F,F)

I want to see if a part of list 1 (F,T,T,F,F,F) equals to some part of the list blist. So for an example blist of (F,F,T,T,F,F,F,F) the output should be, that a part of list 1 can be found in blist.

So output for example:

blist has a similarity with list 1 ((starting point in list 1) 3, (starting point in blist) 1, (length) 6, (part of list that matches): (F,T,T,F,F,F)

I tried .corr() etc., for-loops and if-conditions and nothing wanted to work properly.

This problem probably has a simple solution, but I can't figure it out.

CodePudding user response：

If you want an implementation that determines where exactly the common list starts at each list, here is a simple "for loop" implementation (I used "0,1" instead of "True and False" for sake of readability. you can convert it easily):

a = [0,1,0,1,1,0,0,1,0,1,1,1,0,1,1,1,0,0]
b = [1,1,1,1,1,0,1,1,1,0,0,0,1,1]

list1_start = 0
list2_start = 0
common_list = []

for i in range(len(a)):
    for j in range(len(a)-i):
        for i2 in range(len(b)):
            for j2 in range(len(b)-i2):
                if a[i:i j] == b[i2:i2 j2]:
                    if len(common_list)<len(a[i:i j]):
                        common_list = a[i:i j]
                        list1_start = i 1
                        list2_start = i2 1
                    
print("list1 start: ", list1_start)
print("list2 start: ", list2_start)
print("common_list: ",common_list)

the answer:

list1 start:  10
list2 start:  3
common_list:  [1, 1, 1, 0, 1, 1, 1, 0]

CodePudding user response：

I would suggest using something like the KMP algorithm to match subset lists in the whole list. It would utilize O(n) complexity. Making a full implementation use worst of O(n^3) but most likely O(n^2). Solution could look like this:

longest_match = []

for i in range(len(blists)):
  if(i len(longest_match) >= len(blists):
    break
  for j in range(i len(longest_match) 1, len(blists)):
    if(KMPSearch(blists[i,j], blist)):
      longest_match = blists[i,j]
    else:
      break

Where you loop through each position in blists and check a subset list starting at i and going one past the length of the longest match already found.

CodePudding user response：

one way would be to join the lists into strings and use the "in" and "find" string methods and loops:

l1 = ['T','T','T','F','T','T','F','F','F','T']
blist = ['F','F','T','T','F','F','F','F']

w1, bw = ''.join(l1), ''.join(blist)

all_options = sum([[w1[i:ii] for i in range(len(w1))] for ii in range(len(w1))], [])
all_valid_options = [x for x in all_options if (len(x)>1 and x in bw)]
longest_option = sorted(all_valid_options, key=len)[-1]

w1_start = w1.find(longest_option)
w1_end = w1_start   len(longest_option)
bw_start = bw.find(longest_option)
bw_end = bw_start   len(longest_option)
print(longest_option, len(longest_option), (w1_start, w1_end), (bw_start, bw_end))

FTTFFF 6 (3, 9) (1, 7)