Home > Software design >  How to get a div or span class from a related span class?
How to get a div or span class from a related span class?

Time:08-27

I've found the lowest class: <span > of multiple elements of a website but now I want to find the related/linked upper-class so for example the highest <div xpath="1">. I've got the soup but can't figure out a way to get from the 'lowest' class to the 'highest' class, any idea?

<div  xpath="1">
 <div >
  <header >
  <div >
  <footer >
   <span >

End result would be:

INPUT- Search on on all elements of a page with class <span >(lowest)

OUTPUT - Get all related titles or headers of said class.

I've tried it with if-statements but that doesn't work consistently. Something with an if class = (searchable class) then get (desired higher class) should work.

I can add any more details if needed please let me know, thanks in advance!

EDIT: Picture per clarification where the title(highest class) = "Wooferland Festival 2022" and the number(lowest class) = 253 Listing, title, number

CodePudding user response:

As mentioned, question needs some more information, to give a concret answer.

Assuming you like to scrape the information in the picture based on your example HTML you select your pill and use .find_previous() to locate your elements:

for e in soup.select('span.pill'):
    print(e.find_previous('header').text)
    print(e.find_previous('div').text)
    print(e.text)

Assuming there is a cotainer tag in HTML structure like <a> or other you would select this based on the condition, that it contains a <span> wit class pill:

for e in soup.select('a:has(span.pill)'):
    print(e.header.text)
    print(e.header.next.text)
    print(e.footer.span.text)

Note: Instead of using css classes, that can be highly dynamic, try use more static attributes or the HTML structure.

Example

See both options, for first one the <a> do not matter.

from bs4 import BeautifulSoup
html='''
<a>
<div  xpath="1">
 <div >
  <header >some date information</header>
  <div >some title</div>
  <footer >
   <span >some number</span>
  <footer>
 </div>
</div>
</a>
'''

soup = BeautifulSoup(html)

for e in soup.select('span.pill'):
    print(e.find_previous('header').text)
    print(e.find_previous('div').text)
    print(e.text)

print('---------')
    
for e in soup.select('a:has(span.pill)'):
    print(e.header.text)
    print(e.header.next.text)
    print(e.footer.span.text)

Output

some date information
some title
some number
---------
some date information
some date information
some number
  • Related