How to Exactly surround words with HTML Tags in python-CodePudding

Now I have these variables and their respective values right here.

s = '''
vinyl I had to go to Miami. The size of the ball is huge also the vinyl cutters.
I have a computer and it is only 1.
Another vinyl

vinylDiesel
'''

data =[
"vinyl",
"size",
"vinyl cutters",
"computer",
"1",
"vinyl",
"5"
]

Now what I wanted to happen is for every word in the data variable that it could be surrounded with specific HTML Tags in the "s" variable. Now take note that the tags pretty much depend on what I want but for this example let's just use <tag></tag> & <sub></sub> for ease.

Now originally I just wanted to have an output like this. (See the image)

But before we can achieve what's on the Image we need to surround those words with the CORRECT HTML Tags. Why is that?, It's because I am trying to display the result in a PYQT5 QTextEdit Widget. Since using HTML is the way to add some stylesheet then that is what I am doing.

Now in order to have that result in the Image. I need help to create a program that would generate an output like this.

Expected Output:

(<tag>vinyl<tag>)<sub>1</sub> I had to go to Miami. The (<tag>size</tag>)<sub>2</sub> of the ball is huge also the (<tag>(<tag>vinyl</tag>)<sub>1</sub> cutters</tag>)<sub>3</sub>.
I have a (<tag>computer</tag>)<sub>4</sub> and it is only (<tag>1</tag>)<sub>5</sub>.
Another (<tag>vinyl<tag>)<sub>1</sub>

(<tag>vinyl</tag>)<sub>1</sub>Diesel

Then once this is done then I can just simply set the HTML code of the QTextEdit widget to that of the Expected Output then we would have the output from the image.

What I have tried so far.

import re
s = '''
vinyl I had to go to Miami. The size of the ball is huge also the vinyl cutters.
I have a computer and it is only 1.
Another vinyl

vinylDiesel
'''


data =[
"vinyl",
"size",
"vinyl cutters",
"computer",
"1",
"vinyl",
"5"
]

for i,p in enumerate(data):
    name = p
    html_element_name ="span"
    color = "blue"

    html_attrs={"style": f"color:{color};font-weight: bold;"}

    sub_num = f"<sub style='font-weight:bold;font-size:15px;'>{i 1}</sub>"

    html_start_tag = '(' "<"   html_element_name   " "   " ".join(["%s='%s'" % (k, html_attrs[k]) for k in html_attrs])   ">"
    html_end_tag = "</"   html_element_name   ">" ')' sub_num

    to_replace_with = '  ' html_start_tag f"{name}" html_end_tag '  '

    s = re.sub(fr"{name}",to_replace_with, s)


print(s)

CodePudding user response：

You can use recursion:

def to_tags(s, data, p = []):
   new_s = ''
   while s:
      if (k:=[(i, a) for i, a in enumerate(data, 1) if s.startswith(a) and a not in p]):
         i, sb = max(k, key=lambda x:len(x[-1]))
         new_s  = f'(<tag>{to_tags(sb, data, p   [sb])}</tag>)<sub>{i}</sub>'
         s = s[len(sb):]
      else:
         new_s, s = new_s s[0], s[1:]
   return new_s

print(to_tags(s, data))

Output:

'\n(<tag>vinyl</tag>)<sub>1</sub> I had to go to Miami. The (<tag>size</tag>)<sub>2</sub> of the ball is huge also the (<tag>(<tag>vinyl</tag>)<sub>1</sub> cutters</tag>)<sub>3</sub>.\nI have a (<tag>computer</tag>)<sub>4</sub> and it is only (<tag>1</tag>)<sub>5</sub>.\nAnother (<tag>vinyl</tag>)<sub>1</sub>\n\n(<tag>vinyl</tag>)<sub>1</sub>Diesel\n'