I have one line html formatted like this:
<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>
The HTML code will have attribute class and id unspecified. I need to add "\n" line by line when closed HTML code. My code in Python is this:
TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']
html = '''<ol ><li id="hello">Create A Bakery Business Plan. ... </li><li id="hello">Choose A Location For Your Bakery Business. ... </li><li >Get All Licenses Required To Open A Bakery Business In India. ... </li><li >Get Manpower Required To Open A Bakery. ... </li><li >Buy Equipment Needed To Start A Bakery Business.</li></ol>'''
for tag in TAGS:
html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))
for tag in SINGLE_LINE_TAGS:
html = html.replace('<{}>'.format(tag), '<{}>\n'.format(tag))
html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))
html = html.replace(' />', ' />\n')
print(html)
But the result is:
<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>
Why isn't it this:
<ol class="X5LH0c">
<li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>
I don't use regex. Can anyone help me fix the code? Thanks for your support!
CodePudding user response:
A quick fix is the following.
TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']
html = '''<ol ><li id="hello">Create A Bakery Business Plan. ... </li><li id="hello">Choose A Location For Your Bakery Business. ... </li><li >Get All Licenses Required To Open A Bakery Business In India. ... </li><li >Get Manpower Required To Open A Bakery. ... </li><li >Buy Equipment Needed To Start A Bakery Business.</li></ol>'''
# Xuống dòng mỗi thẻ
for tag in TAGS:
html = html.replace('<{}'.format(tag), '\n<{}'.format(tag))
for tag in SINGLE_LINE_TAGS:
html = html.replace('<{}>'.format(tag), '\n<{}>'.format(tag))
html = html.replace('</{}>'.format(tag), '\n</{}>'.format(tag))
print(html)
So, I have replaced the search of a closing-bracket </{}>
to the search of a opening-bracket <{}
, and switched the placement of the newline character before the opening bracket (so that <li class ...
is also replaced).
This produced an extra line at the start of the html file, which you can get rid of using html = html[1:]
.
A more elegant solution would be to use regex replacements, but it all depends on the exact output you desire.