Home > other >  Complex list comparisons in python
Complex list comparisons in python

Time:07-26

I want to do a complex list comparison with python. I want to see if listB contains all of the items from listA and if they are in the same order. But I do not care if listB has extra items or interleaved items.

Examples:

listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']

A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have an item in between them and items at the start and end of listB.

Extra complicated:

listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']

A, B, C, D, and E all appear in listB and are presented in the same order even though A and B have two items in between them and one of those items happens to be something we are searching for. But since we are looking if A -> B is sequential and B -> C is sequential the fact that we also have C -> B -> C shouldn't matter.

So,

listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']

Would be True

listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']

Would be True

But something like:

listA = ['A','B','C','D','E']
listB = ['A','B','C','D','F']

or even

listB = ['A','B','C','D']

Would be False

If it get a False answer, I'd ideally like to be able to point to where the break in sequence happened -- i.e. E is missing.

CodePudding user response:

Simple solution using a nested loop. Walk over listA and search the elements in listB in order. Should you fail at any point -> this is not a substring:


def check(listA, listB):
    start = 0
    for a in listA:
        for i in range(start, len(listB)):
            if a == listB[i]:
                start = i 1
                break
        else:  # triggered only if no break
            # print(f'{a} not found after position {start}')
            return False
    return True

check('ABCDE', 'A*CBCDE')
# True

check('ABCDEF', 'A*CBCDE')
# False

check ('', '')
# True

check('ABA', 'ABBA')
# True

NB. Using strings here for clarity, but this works for any iterable.

To get information on the non-found item, you can uncomment the print.

Example:

check('ABCA', 'ABBA')

C not found after position 2
# False

CodePudding user response:

Recursive Solution

def is_contained(list_a, list_b):
    # Base case
    if not list_a:
        return True
    if not list_b:
        return False
    
    # Check if first items of each match
    if list_a[0] == list_b[0]:
        # 1. can match in remaining items of each or 
        # 2. list_a can find a match in the remaining items of list_b
        return (is_contained(list_a[1:], list_b[1:]) or 
                is_contained(list_a, list_b[1:]))
    else:
        # No match in first items, so can drop first item of list_b
        return is_contained(list_a, list_b[1:]

Test

print(is_contained(['A','B','C','D','E'], ['A','*','C','B','C','D','E']))  # True
print(is_contained(['A','B','C','D','E'], ['A','B','C','D','F']))          # False

CodePudding user response:

As always Python comes with batteries included, use SequenceMatcher.get_matching_blocks for a one-line solution:

from difflib import SequenceMatcher


def check(lstA, lstB):
    sm = SequenceMatcher(isjunk=lambda x: x not in set(lstA), a=lstA, b=lstB)
    return sum(block.size for block in sm.get_matching_blocks()) == len(lstA) 

print(check(['A', 'B', 'C', 'D', 'E'], [':', 'A', '*', 'B', 'C', 'D', 'E', '`']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', '*', 'C', 'B', 'C', 'D', 'E']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'F']))

Output

True
True
False

Notice that this also works for any sequence, like strings:

print(check('ABCDE', 'A*CBCDE'))
print(check('ABCDEF', 'A*CBCDE'))
print(check('', ''))
print(check('ABA', 'ABBA'))

Output (for strings)

True
False
True
True

SequenceMatcher will even give information on how to transform one sequence into the other:

listA = ['A', 'B', 'C', 'D', 'E']
listB = [':', 'A', '*', 'B', 'C', 'D', 'E', '`']
s = SequenceMatcher(None, listA, listB)
for tag, i1, i2, j1, j2 in s.get_opcodes():
    print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(tag, i1, i2, j1, j2, listA[i1:i2], listB[j1:j2]))

Output

insert    a[0:0] --> b[0:1]       [] --> [':']
equal     a[0:1] --> b[1:2]    ['A'] --> ['A']
insert    a[1:1] --> b[2:3]       [] --> ['*']
equal     a[1:5] --> b[3:7] ['B', 'C', 'D', 'E'] --> ['B', 'C', 'D', 'E']
insert    a[5:5] --> b[7:8]       [] --> ['`']
  • Related