I have two strings, a
and b
, and I can use a.index(b)
to find the index of string b
in string a
.
a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"
idx = a.index(b)
But when string b
is not exactly a slice of string a
, this does not work. For example when string b
is:
b = "How fo I solve rhis"
I want a way using which we can find the index of b
in a
when the number of "mismatched" characters is at max 5.
CodePudding user response:
The straightforward approach is to iterate over the possible indices and count the mismatches between the substring of a
starting at that index and b
, returning the index if the number of mismatches is below the threshold:
def fuzzy_index(a, b, max_mismatches=5):
n_overall = len(a)
n_to_match = len(b)
if n_overall < n_to_match:
return None
if n_to_match <= max_mismatches:
return 0
for i in range(n_overall - n_to_match 1):
if sum(c_a != c_b for c_a, c_b in zip(a[i : i n_to_match], b)
) <= max_mismatches:
return i
a = """
Hello! This is a string which I am using to present a quesion to stackoverflow because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"
print(fuzzy_index(a, b)) # -> 110
There are also packages for fuzzy string matching that you may want to use, e.g. fuzzywuzzy