Home > OS >  How to Compare Substrings of Two Lists in Python
How to Compare Substrings of Two Lists in Python

Time:06-01

I have two lists, shortened for this example:

l1 = ['Chase Bank', 'Bank of America']

l2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']

I am trying to generate a list from l1 that is not in l2. In this case; 'Bank of America' would be the only item returned.

Chase Bank (from l1) and Chase Mobile: Bank & Invest (from l2) are the same because they both contain the keyword 'Chase', so they wouldn't go into the exclusion list. But Bank of America should go into the list, even though 'Bank' appears both in 'Bank of America' and 'Bank & Invest'.

I have tried using set, just a for loop with if/in as well as using any with a list comprehension. I have also tried regex, but matching the pattern of substrings from one list to the other is proving to be very difficult for me.

Is this possible with Python or should I broaden my approach?

CodePudding user response:

Use list comprehension and re.sub to remove all undesired substrings from the elements of your first list. Here, I remove bank, case-insensitively, with optional whitespace before and after it. Then use another list comprehension, this time to remove everything that is found in the second list. Use enumerate to get both the index and the element from the list. Also, use sets, which is optional and makes the code faster for long and/or repetitive lists.

import re

lst1 = ['Chase Bank', 'Chase bank', 'Bank of America']
lst2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']
lst1_short = [re.sub(r'(?i)\s*\bbank\b\s*', '', s) for s in lst1]
print(lst1_short)
# ['Chase', 'Chase', 'of America']

lst1 = [s for i, s in enumerate(lst1) if
      not any(x for x in set(lst2) if lst1_short[i] in x)]
print(lst1)
# ['Bank of America']

Note: you can extend your list of stop words (here, only bank) using regular expressions. For example:

re.sub(r'(?i)\s*\b(bank|credit union|institution for savings)\b\s*', '', s)

CodePudding user response:

You can do it with a list comprehension:

l2_chase = any('Chase' in j for j in l2)
[i for i in l1 if not ('Chase' in i and l2_chase)]

Output:

['Bank of America']

CodePudding user response:

You should try something like this:

l1 = ['Chase Bank', 'Bank of America']
l2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']

def similar_substrings(l1, l2):
    word1 = [l1[i].split(" ") for i in range(len(l1))]
    word2 = [l2[i].split(" ") for i in range(len(l2))]
    words_in = []

    for string in l1:
        for string2 in l2:
            is_in = True
            for word in string:
                if word not in string2:
                    is_in = False
            if is_in:
                words_in.append(string)

    return words_in

print(similar_substrings(l1, l2))

I only checked if sentences from l2 were contained in l1 but you can modify it pretty easily to check both inclusions.

  • Related