I've found the lowest class: <span >
of multiple elements of a website but now I want to find the related/linked upper-class so for example the highest <div xpath="1">
. I've got the soup but can't figure out a way to get from the 'lowest' class to the 'highest' class, any idea?
<div xpath="1">
<div >
<header >
<div >
<footer >
<span >
End result would be:
INPUT- Search on on all elements of a page with class <span >
(lowest)
OUTPUT - Get all related titles or headers of said class.
I've tried it with if-statements but that doesn't work consistently. Something with an if class = (searchable class) then get (desired higher class) should work.
I can add any more details if needed please let me know, thanks in advance!
EDIT: Picture per clarification where the title(highest class) = "Wooferland Festival 2022" and the number(lowest class) = 253
CodePudding user response:
As mentioned, question needs some more information, to give a concret answer.
Assuming you like to scrape the information in the picture based on your example HTML you select your pill
and use .find_previous()
to locate your elements:
for e in soup.select('span.pill'):
print(e.find_previous('header').text)
print(e.find_previous('div').text)
print(e.text)
Assuming there is a cotainer tag in HTML structure like <a>
or other you would select this based on the condition, that it contains a <span>
wit class pill
:
for e in soup.select('a:has(span.pill)'):
print(e.header.text)
print(e.header.next.text)
print(e.footer.span.text)
Note: Instead of using css classes, that can be highly dynamic, try use more static attributes or the HTML structure.
Example
See both options, for first one the <a>
do not matter.
from bs4 import BeautifulSoup
html='''
<a>
<div xpath="1">
<div >
<header >some date information</header>
<div >some title</div>
<footer >
<span >some number</span>
<footer>
</div>
</div>
</a>
'''
soup = BeautifulSoup(html)
for e in soup.select('span.pill'):
print(e.find_previous('header').text)
print(e.find_previous('div').text)
print(e.text)
print('---------')
for e in soup.select('a:has(span.pill)'):
print(e.header.text)
print(e.header.next.text)
print(e.footer.span.text)
Output
some date information
some title
some number
---------
some date information
some date information
some number