How to add "\n" line by line in string html with python?-CodePudding

I have one line html formatted like this:

<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>

The HTML code will have attribute class and id unspecified. I need to add "\n" line by line when closed HTML code. My code in Python is this:

TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']

html = '''<ol ><li  id="hello">Create A Bakery Business Plan. ... </li><li  id="hello">Choose A Location For Your Bakery Business. ... </li><li >Get All Licenses Required To Open A Bakery Business In India. ... </li><li >Get Manpower Required To Open A Bakery. ... </li><li >Buy Equipment Needed To Start A Bakery Business.</li></ol>'''

for tag in TAGS:
    html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))

for tag in SINGLE_LINE_TAGS:
    html = html.replace('<{}>'.format(tag), '<{}>\n'.format(tag))
    html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))

html = html.replace(' />', ' />\n')

print(html)

But the result is:

<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>

Why isn't it this:

<ol class="X5LH0c">
<li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>

I don't use regex. Can anyone help me fix the code? Thanks for your support!

CodePudding user response：

A quick fix is the following.

TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']

html = '''<ol ><li  id="hello">Create A Bakery Business Plan. ... </li><li  id="hello">Choose A Location For Your Bakery Business. ... </li><li >Get All Licenses Required To Open A Bakery Business In India. ... </li><li >Get Manpower Required To Open A Bakery. ... </li><li >Buy Equipment Needed To Start A Bakery Business.</li></ol>'''

# Xuống dòng mỗi thẻ
for tag in TAGS:
    html = html.replace('<{}'.format(tag), '\n<{}'.format(tag))

for tag in SINGLE_LINE_TAGS:
    html = html.replace('<{}>'.format(tag), '\n<{}>'.format(tag))
    html = html.replace('</{}>'.format(tag), '\n</{}>'.format(tag))

print(html)

So, I have replaced the search of a closing-bracket </{}> to the search of a opening-bracket <{}, and switched the placement of the newline character before the opening bracket (so that <li class ... is also replaced).

This produced an extra line at the start of the html file, which you can get rid of using html = html[1:].

A more elegant solution would be to use regex replacements, but it all depends on the exact output you desire.