Home > Mobile >  Index of a sentence in a paragraph
Index of a sentence in a paragraph

Time:05-27

I have two strings, a and b, and I can use a.index(b) to find the index of string b in string a.

a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"

idx = a.index(b)

But when string b is not exactly a slice of string a, this does not work. For example when string b is:

b = "How fo I solve rhis"

I want a way using which we can find the index of b in a when the number of "mismatched" characters is at max 5.

CodePudding user response:

The straightforward approach is to iterate over the possible indices and count the mismatches between the substring of a starting at that index and b, returning the index if the number of mismatches is below the threshold:

def fuzzy_index(a, b, max_mismatches=5):
    
    n_overall = len(a)
    n_to_match = len(b)
    if n_overall < n_to_match:
        return None
    if n_to_match <= max_mismatches:
        return 0
    
    for i in range(n_overall - n_to_match   1):
        if sum(c_a != c_b for c_a, c_b in zip(a[i : i   n_to_match], b)
                ) <= max_mismatches:
            return i

        
a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"

print(fuzzy_index(a, b))  # -> 110

There are also packages for fuzzy string matching that you may want to use, e.g. fuzzywuzzy

  • Related