I want to do a complex list comparison with python. I want to see if listB
contains all of the items from listA
and if they are in the same order. But I do not care if listB
has extra items or interleaved items.
Examples:
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
A
, B
, C
, D
, and E
all appear in listB
and are presented in the same order even though A
and B
have an item in between them and items at the start and end of listB
.
Extra complicated:
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
A
, B
, C
, D
, and E
all appear in listB
and are presented in the same order even though A
and B
have two items in between them and one of those items happens to be something we are searching for. But since we are looking if A
-> B
is sequential and B
-> C
is sequential the fact that we also have C
-> B
-> C
shouldn't matter.
So,
listA = ['A','B','C','D','E']
listB = [':','A','*','B','C','D','E','`']
Would be True
listA = ['A','B','C','D','E']
listB = ['A','*','C','B','C','D','E']
Would be True
But something like:
listA = ['A','B','C','D','E']
listB = ['A','B','C','D','F']
or even
listB = ['A','B','C','D']
Would be False
If it get a False answer, I'd ideally like to be able to point to where the break in sequence happened -- i.e. E is missing
.
CodePudding user response:
Simple solution using a nested loop. Walk over listA and search the elements in listB in order. Should you fail at any point -> this is not a substring:
def check(listA, listB):
start = 0
for a in listA:
for i in range(start, len(listB)):
if a == listB[i]:
start = i 1
break
else: # triggered only if no break
# print(f'{a} not found after position {start}')
return False
return True
check('ABCDE', 'A*CBCDE')
# True
check('ABCDEF', 'A*CBCDE')
# False
check ('', '')
# True
check('ABA', 'ABBA')
# True
NB. Using strings here for clarity, but this works for any iterable.
To get information on the non-found item, you can uncomment the print
.
Example:
check('ABCA', 'ABBA')
C not found after position 2
# False
CodePudding user response:
Recursive Solution
def is_contained(list_a, list_b):
# Base case
if not list_a:
return True
if not list_b:
return False
# Check if first items of each match
if list_a[0] == list_b[0]:
# 1. can match in remaining items of each or
# 2. list_a can find a match in the remaining items of list_b
return (is_contained(list_a[1:], list_b[1:]) or
is_contained(list_a, list_b[1:]))
else:
# No match in first items, so can drop first item of list_b
return is_contained(list_a, list_b[1:]
Test
print(is_contained(['A','B','C','D','E'], ['A','*','C','B','C','D','E'])) # True
print(is_contained(['A','B','C','D','E'], ['A','B','C','D','F'])) # False
CodePudding user response:
As always Python comes with batteries included, use SequenceMatcher.get_matching_blocks
for a one-line solution:
from difflib import SequenceMatcher
def check(lstA, lstB):
sm = SequenceMatcher(isjunk=lambda x: x not in set(lstA), a=lstA, b=lstB)
return sum(block.size for block in sm.get_matching_blocks()) == len(lstA)
print(check(['A', 'B', 'C', 'D', 'E'], [':', 'A', '*', 'B', 'C', 'D', 'E', '`']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', '*', 'C', 'B', 'C', 'D', 'E']))
print(check(['A', 'B', 'C', 'D', 'E'], ['A', 'B', 'C', 'D', 'F']))
Output
True
True
False
Notice that this also works for any sequence, like strings:
print(check('ABCDE', 'A*CBCDE'))
print(check('ABCDEF', 'A*CBCDE'))
print(check('', ''))
print(check('ABA', 'ABBA'))
Output (for strings)
True
False
True
True
SequenceMatcher
will even give information on how to transform one sequence into the other:
listA = ['A', 'B', 'C', 'D', 'E']
listB = [':', 'A', '*', 'B', 'C', 'D', 'E', '`']
s = SequenceMatcher(None, listA, listB)
for tag, i1, i2, j1, j2 in s.get_opcodes():
print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(tag, i1, i2, j1, j2, listA[i1:i2], listB[j1:j2]))
Output
insert a[0:0] --> b[0:1] [] --> [':']
equal a[0:1] --> b[1:2] ['A'] --> ['A']
insert a[1:1] --> b[2:3] [] --> ['*']
equal a[1:5] --> b[3:7] ['B', 'C', 'D', 'E'] --> ['B', 'C', 'D', 'E']
insert a[5:5] --> b[7:8] [] --> ['`']