Home > Net >  Selecting BeautifulSoup tag based on attributes value
Selecting BeautifulSoup tag based on attributes value

Time:10-24

Suppose that we have an .htm page with an index and some content below. Each element of the index has its link to the related section on the document; Suppose that our starting point is a tag with an href, (<a href="#001">SECTION 1</a>); I want to look into all tags to find the reference to this href, therefore I want to find some tag which have this value specified for some attribute. I have looked into some of those documents and this are some example of referring tags:

  1. <a id="#001">SECTION 1</a>
  2. <a name="#001">SECTION 1</a>
  3. <div name="#001">SECTION 1</div>
  4. <div id="#001">SECTION 1</div>

Hence, since I cannot predict the tag name or the name of the attribute which contains the reference to the href value, how can I make this search only value based? Is there some BeaufifulSoup member function to do this? Can I avoid the loop looking to all attributes?

CodePudding user response:

You can use lambda function in soup.find_all(), for example:

from bs4 import BeautifulSoup

html_doc = """\
    <a id="#001">SECTION 1</a>

    <a>something other</a>

    <a name="#001">SECTION 1</a>
    <div name="#001">SECTION 1</div>

    <div name="#002">something other</div>

    <div id="#001">SECTION 1</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for tag in soup.find_all(lambda tag: any(tag[a] == "#001" for a in tag.attrs)):
    print(tag)

Prints:

<a id="#001">SECTION 1</a>
<a name="#001">SECTION 1</a>
<div name="#001">SECTION 1</div>
<div id="#001">SECTION 1</div>
  • Related