In the Given HTML
below:
given = """<html>
<body>
Free Text: Above
<ul>
<li> data 1 </li>
<li>
<ul>
<li>
<ol start = "321">
<li> sub-sub list 1
<ol>
<li> sub sub sub list </li>
</ol>
</li>
<li> sub-sub list 2 </li>
</ol>
</li>
<li> sub list 2 </li>
<li> sub list 3 </li>
</ul>
</li>
<li> <p> list type paragraph </p> data 3 </li>
</ul>
Free Text: Middle
<ul>
<li> Second UL list </li>
<li> Second List part 2 </li>
</ul>
Free Text : Below
</body>
</html>"""
Now I want to ask:
How can I change the Children <li>
tags whose ANY
of the parent is
<SOME>
(please don't ask why would I want to and I won't be able to render it. I have reasons)
In a nutshell, I want my above code to look like:
result = """<html>
<body>
Free Text: Above
<ul>
<li> data 1 </li>
<li>
<ul>
<SOME>
<ol start = "321">
<SOME> sub-sub list 1
<ol>
<SOME> sub sub sub list </SOME>
</ol>
</SOME>
<SOME> sub-sub list 2 </SOME>
</ol>
</SOME>
<SOME> sub list 2 </SOME>
<SOME> sub list 3 </SOME>
</ul>
</li>
<li> <p> list type paragraph </p>data 3 </li>
</ul>
Free Text: Middle
<ul>
<li> Second UL list </li>
<li> Second List part 2 </li>
</ul>
Free Text : Below
</body>
</html>"""
I tried (with and without tag.decompose
:
soup = BeautifulSoup(given, 'html.parser')
for tag in soup.find_all(['li']):
if tag.find_parents("li"):
new_tag = soup.new_tag("SOME")
new_tag.string = tag.text
tag.replace_with(new_tag)
result = str(soup)
but it doesn't seem to work on depth > 1
such as inner tags like sub-sub list
etc
CodePudding user response:
Instead of .replace_with()
may simply rename it with .name
to keep structure:
for tag in soup.select('li li'):
tag.name = 'SOME'
Example
from bs4 import BeautifulSoup
html = '''<html>
<body>
Free Text: Above
<ul>
<li> data 1 </li>
<li>
<ul>
<li>
<ol start = "321">
<li> sub-sub list 1
<ol>
<li> sub sub sub list </li>
</ol>
</li>
<li> sub-sub list 2 </li>
</ol>
</li>
<li> sub list 2 </li>
<li> sub list 3 </li>
</ul>
</li>
<li> <p> list type paragraph </p> data 3 </li>
</ul>
Free Text: Middle
<ul>
<li> Second UL list </li>
<li> Second List part 2 </li>
</ul>
Free Text : Below
</body>
</html>'''
soup = BeautifulSoup(html)
for tag in soup.select('li li'):
tag.name = 'SOME'
soup