Home > Blockchain >  Do two boolean colums/lists match? Comparison of two different sized colums: Does a part of one list
Do two boolean colums/lists match? Comparison of two different sized colums: Does a part of one list

Time:12-13

I have different lists of boolean values in different lengths which I will call blists, which I want to compare to a fixed boolean list over time, which I will call blist.

The goal is to find the longest matching series in a list (of blists) in blist.

I want to see if a part or even a whole list of blists can be found somewhere in the list of blist. I would set a minimum of matching values to not overfill my output.

For example (True = T, False = F, example shorter than in real life):

List 1 of blists: (T,T,T,F,T,T,F,F,F,T)

blist: (F,F,T,T,F,F,F,F)

I want to see if a part of list 1 (F,T,T,F,F,F) equals to some part of the list blist. So for an example blist of (F,F,T,T,F,F,F,F) the output should be, that a part of list 1 can be found in blist.

So output for example:

blist has a similarity with list 1 ((starting point in list 1) 3, (starting point in blist) 1, (length) 6, (part of list that matches): (F,T,T,F,F,F)

I tried .corr() etc., for-loops and if-conditions and nothing wanted to work properly.

This problem probably has a simple solution, but I can't figure it out.

CodePudding user response:

If you want an implementation that determines where exactly the common list starts at each list, here is a simple "for loop" implementation (I used "0,1" instead of "True and False" for sake of readability. you can convert it easily):

a = [0,1,0,1,1,0,0,1,0,1,1,1,0,1,1,1,0,0]
b = [1,1,1,1,1,0,1,1,1,0,0,0,1,1]

list1_start = 0
list2_start = 0
common_list = []

for i in range(len(a)):
    for j in range(len(a)-i):
        for i2 in range(len(b)):
            for j2 in range(len(b)-i2):
                if a[i:i j] == b[i2:i2 j2]:
                    if len(common_list)<len(a[i:i j]):
                        common_list = a[i:i j]
                        list1_start = i 1
                        list2_start = i2 1
                    
print("list1 start: ", list1_start)
print("list2 start: ", list2_start)
print("common_list: ",common_list)

the answer:

list1 start:  10
list2 start:  3
common_list:  [1, 1, 1, 0, 1, 1, 1, 0]

CodePudding user response:

I would suggest using something like the KMP algorithm to match subset lists in the whole list. It would utilize O(n) complexity. Making a full implementation use worst of O(n^3) but most likely O(n^2). Solution could look like this:

longest_match = []

for i in range(len(blists)):
  if(i len(longest_match) >= len(blists):
    break
  for j in range(i len(longest_match) 1, len(blists)):
    if(KMPSearch(blists[i,j], blist)):
      longest_match = blists[i,j]
    else:
      break

Where you loop through each position in blists and check a subset list starting at i and going one past the length of the longest match already found.

CodePudding user response:

one way would be to join the lists into strings and use the "in" and "find" string methods and loops:

l1 = ['T','T','T','F','T','T','F','F','F','T']
blist = ['F','F','T','T','F','F','F','F']

w1, bw = ''.join(l1), ''.join(blist)

all_options = sum([[w1[i:ii] for i in range(len(w1))] for ii in range(len(w1))], [])
all_valid_options = [x for x in all_options if (len(x)>1 and x in bw)]
longest_option = sorted(all_valid_options, key=len)[-1]

w1_start = w1.find(longest_option)
w1_end = w1_start   len(longest_option)
bw_start = bw.find(longest_option)
bw_end = bw_start   len(longest_option)
print(longest_option, len(longest_option), (w1_start, w1_end), (bw_start, bw_end))

FTTFFF 6 (3, 9) (1, 7)

  • Related