comparing lists in python, with a twist-CodePudding

So I have two lists I want to compare, listA and listB. If an item from listA appears in listB, I want to remove it from listB. I can do this with:

listA = ["config", "\n", "config checkpoint"]
listB = ["config exclusive", "config checkpoint test", "config", "config", "config", "\n", "hello"]
    
listB = [line for line in listB if not any(line in item for item in listA)]

But where things now become more complex, is that I have some lines I want to remove only if the list item matches exactly (as it currently does), but also lines that I want to remove if the item from listB contains the item from listA, i.e. a partial match.

I'm not sure whether it can be done succinctly within the same function. I've explored using .startswith, rawstrings to add ^ and $ on the end of the complete lines, importing re.match (I couldn't iterate within the given code).

I think it might just be a beautiful dream, but can anyone think of an elegant way of doing this within the same pass?

CodePudding user response：

You could try using difflib.get_close_matches, that comes by default with Python.

Example:


import difflib

listA = ["config", "\n", "config checkpoint"]
listB = ["config exclusive", "config checkpoint test", "config", "config", "config", "\n", "hello"]

new_listB = [line for line in listB if len(difflib.get_close_matches(line, listA, n=len(listA), cutoff=0.4)) == 0]
print(new_listB)
# Prints:
#
# ['hello']

Notes

The function difflib.get_close_matches, contains a parameter named cutoff. This parameter accepts values between 0 and 1 and is equal to 0.6 by default. The ideia here is that the lower you set this cutoff parameter, the lesser strict the function will be when trying to find elements from listA that match line. Here's an example:


difflib.get_close_matches('John', ['John', 'Joe', 'Jane', 'Janet'], cutoff=0.2, n=100)
# Returns:
#
# ['John', 'Joe', 'Jane', 'Janet']

difflib.get_close_matches('John', ['John', 'Joe', 'Jane', 'Janet'], cutoff=0.6, n=100)
# Returns:
#
# ['John']

CodePudding user response：

If, listA is a list of regex patterns (as you wrote in comments), you can do:

import re

listA = ["^config$", "^\n$", "^config checkpoint"]
listB = ["config exclusive", "config checkpoint test", "config", "config", "config", "\n", "hello"]

listB = [line for line in listB if not any(re.match(item, line) for item in listA)]
print(listB)

Output

['config exclusive', 'hello']