Index of a sentence in a paragraph-CodePudding

I have two strings, a and b, and I can use a.index(b) to find the index of string b in string a.

a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"

idx = a.index(b)

But when string b is not exactly a slice of string a, this does not work. For example when string b is:

b = "How fo I solve rhis"

I want a way using which we can find the index of b in a when the number of "mismatched" characters is at max 5.

CodePudding user response：

The straightforward approach is to iterate over the possible indices and count the mismatches between the substring of a starting at that index and b, returning the index if the number of mismatches is below the threshold:

def fuzzy_index(a, b, max_mismatches=5):
    
    n_overall = len(a)
    n_to_match = len(b)
    if n_overall < n_to_match:
        return None
    if n_to_match <= max_mismatches:
        return 0
    
    for i in range(n_overall - n_to_match   1):
        if sum(c_a != c_b for c_a, c_b in zip(a[i : i   n_to_match], b)
                ) <= max_mismatches:
            return i

        
a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"

print(fuzzy_index(a, b))  # -> 110

There are also packages for fuzzy string matching that you may want to use, e.g. fuzzywuzzy