Suppose that we have an .htm
page with an index and some content below. Each element of the index has its link to the related section on the document;
Suppose that our starting point is a tag with an href, (<a href="#001">SECTION 1</a>
); I want to look into all tags to find the reference to this href, therefore I want to find some tag which have this value specified for some attribute. I have looked into some of those documents and this are some example of referring tags:
<a id="#001">SECTION 1</a>
<a name="#001">SECTION 1</a>
<div name="#001">SECTION 1</div>
<div id="#001">SECTION 1</div>
Hence, since I cannot predict the tag name or the name of the attribute which contains the reference to the href value, how can I make this search only value based? Is there some BeaufifulSoup member function to do this? Can I avoid the loop looking to all attributes?
CodePudding user response:
You can use lambda function in soup.find_all()
, for example:
from bs4 import BeautifulSoup
html_doc = """\
<a id="#001">SECTION 1</a>
<a>something other</a>
<a name="#001">SECTION 1</a>
<div name="#001">SECTION 1</div>
<div name="#002">something other</div>
<div id="#001">SECTION 1</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.find_all(lambda tag: any(tag[a] == "#001" for a in tag.attrs)):
print(tag)
Prints:
<a id="#001">SECTION 1</a>
<a name="#001">SECTION 1</a>
<div name="#001">SECTION 1</div>
<div id="#001">SECTION 1</div>