Home > Mobile >  How the change EVERY children tag (of a specific nature) to a different one using BeauifulSoup
How the change EVERY children tag (of a specific nature) to a different one using BeauifulSoup

Time:02-02

In the Given HTML below:

given = """<html>
    <body>
        Free Text: Above
        <ul>
            <li> data 1 </li>
            <li>
                <ul>
                    <li> 
                        <ol start = "321">
                            <li> sub-sub list 1 
                                <ol>
                                    <li> sub sub sub list </li>
                                </ol>
                            </li>
                            <li> sub-sub list 2 </li>
                        </ol>
                    </li>
                    <li> sub list 2 </li>
                    <li> sub list 3 </li>
                </ul>
            </li>
            <li> <p> list type paragraph </p> data 3 </li>
        </ul>

        Free Text: Middle
        
        <ul>
            <li> Second UL list </li>
            <li> Second List part 2 </li>
        </ul>

        Free Text : Below
    </body>
</html>"""

Now I want to ask:

How can I change the Children <li> tags whose ANY of the parent is

  • to something else, say <SOME> (please don't ask why would I want to and I won't be able to render it. I have reasons)

    In a nutshell, I want my above code to look like:

    result = """<html>
        <body>
            Free Text: Above
            <ul>
                <li> data 1 </li>
                <li>
                    <ul>
                        <SOME> 
                            <ol start = "321">
                                <SOME> sub-sub list 1 
                                    <ol>
                                        <SOME> sub sub sub list </SOME>
                                    </ol>
                                </SOME>
                                <SOME> sub-sub list 2 </SOME>
                            </ol>
                        </SOME>
                        <SOME> sub list 2 </SOME>
                        <SOME> sub list 3 </SOME>
                    </ul>
                </li>
                <li> <p> list type paragraph </p>data 3 </li>
            </ul>
    
            Free Text: Middle
            
            <ul>
                <li> Second UL list </li>
                <li> Second List part 2 </li>
            </ul>
    
            Free Text : Below
        </body>
    </html>"""
    

    I tried (with and without tag.decompose:

    
    soup = BeautifulSoup(given, 'html.parser')
    
    for tag in soup.find_all(['li']):
        if tag.find_parents("li"):
            new_tag = soup.new_tag("SOME")
            new_tag.string = tag.text
            tag.replace_with(new_tag)
    
    result = str(soup)
    

    but it doesn't seem to work on depth > 1 such as inner tags like sub-sub list etc

  • CodePudding user response:

    Instead of .replace_with() may simply rename it with .name to keep structure:

    for tag in soup.select('li li'):
        tag.name = 'SOME'
    

    Example

    from bs4 import BeautifulSoup
    
    html = '''<html>
        <body>
            Free Text: Above
            <ul>
                <li> data 1 </li>
                <li>
                    <ul>
                        <li> 
                            <ol start = "321">
                                <li> sub-sub list 1 
                                    <ol>
                                        <li> sub sub sub list </li>
                                    </ol>
                                </li>
                                <li> sub-sub list 2 </li>
                            </ol>
                        </li>
                        <li> sub list 2 </li>
                        <li> sub list 3 </li>
                    </ul>
                </li>
                <li> <p> list type paragraph </p> data 3 </li>
            </ul>
    
            Free Text: Middle
            
            <ul>
                <li> Second UL list </li>
                <li> Second List part 2 </li>
            </ul>
    
            Free Text : Below
        </body>
    </html>'''
    soup = BeautifulSoup(html)
    
    for tag in soup.select('li li'):
        tag.name = 'SOME'
    
    soup
    
    • Related