I'm trying to make a fuzzy autocomplete suggestion box that highlights searched characters with HTML tags <b></b>
For example, if the user types 'ldi' and one of the suggestions is "Leonardo DiCaprio" then the desired outcome is "Leonardo DiCaprio". The first occurrence of each character is highlighted in order of appearance.
What I'm doing right now is:
def prototype_finding_chars_in_string():
test_string_list = ["Leonardo DiCaprio", "Brad Pitt","Claire Danes","Tobey Maguire"]
comp_string = "ldi" #chars to highlight
regex = ".*?" ".*?".join([f"({x})" for x in comp_string]) ".*?" #results in .*?(l).*?(d).*?(i).*
regex_compiled = re.compile(regex, re.IGNORECASE)
for x in test_string_list:
re_search_result = re.search(regex_compiled, x) # correctly filters the test list to include only entries that features the search chars in order
if re_search_result:
print(f"char combination {comp_string} are in {x} result group: {re_search_result.groups()}")
results in
char combination ldi are in Leonardo DiCaprio result group: ('L', 'D', 'i')
Now I want to replace each occurrence in the result groups with <b>[whatever in the result]</b>
and I'm not sure how to do it.
What I'm currently doing is looping over the result and using the built-in str.replace
method to replace the occurrences:
def replace_with_bold(result_groups, original_string):
output_string: str = original_string
for result in result_groups:
output_string = output_string.replace(result,f"<b>{result}</b>",1)
return output_string
This results in:
Highlighted string: <b>L</b>eonar<b>d</b>o D<b>i</b>Caprio
But I think looping like this over the results when I already have the match groups is wasteful. Furthermore, it's not even correct because it checked the string from the beginning each loop. So for the input 'ooo' this is the result:
char combination ooo are in Leonardo DiCaprio result group: ('o', 'o', 'o')
Highlighted string: Le<b><b><b>o</b></b></b>nardo DiCaprio
When it should be Le<b>o</b>nard<b>o</b> DiCapri<b>o</b>
Is there a way to simplify this? Maybe regex here is overkill?
CodePudding user response:
This should work:
for result in result_groups:
output_string = re.sub(fr'(.*?(?!<b>))({result})((?!</b>).*)',
r'\1<b>\2</b>\3',
output_string,
flags=re.IGNORECASE)
on each iteration first occurrence of result (?
makes .*
lazy this together does the magic of first occurrence) will be replaced by <b>result</b>
if it is not enclosed by tag before ((?!<b>)
and (?!</b>)
does that part) and \1 \2 \3
are first, second and third group additionally we will use IGNORECASE
flag to make it case insensitive.
CodePudding user response:
A way using re.split:
test_string_list = ["Leonardo DiCaprio", "Brad Pitt", "Claire Danes", "Tobey Maguire"]
def filter_and_highlight(strings, letters):
pat = re.compile( '(' (')(.*?)('.join(letters)) ')', re.I)
results = []
for s in strings:
parts = pat.split(s, 1)
if len(parts) == 1: continue
res = ''
for i, p in enumerate(parts):
if i & 1:
p = '<b>' p '</b>'
res = p
results.append(res)
return results
filter_and_highlight(test_string_list, 'lir')
A particularity of re.split
is that captures are included by default as parts in the result. Also, even if the first capture matches at the start of the string, an empty part is returned before it, that means that searched letters are always at odd indexes in the list of substrings.