Home > Enterprise >  bs4 `next_sibling` VS `find_next_sibling`
bs4 `next_sibling` VS `find_next_sibling`

Time:03-26

I struggling with usage of next_sibling (and similarly with next_element). If used as attributes I don't get anything back but if used as find_next_sibling (or find_next) then it works. From the doc:

  • find_next_sibling: "Iterate over the rest of an element’s siblings in the tree. [...] returns the first one (of the match)"
  • find_next: "These methods use .next_elements to iterate over [...] and returns the first one"

So, find_next_sibling depends on next_siblings. On what does next_sibling depends on and why do they return nothing?

from bs4 import BeautifulSoup

html = """
<div one-ad-desc">
  <div >
   <a  href="www this is the URL!">
    <h5>
     Text needed
    </h5>
   </a>
  </div>
  <div >
    ...and some more needed text here!
  </div>
 </div>
</div>
"""

soup = BeautifulSoup(html, 'lxml')

for div in soup.find_all('div', class_="one-ad-title"):
    print('-> ', div.next_element)
    print('-> ', div.next_sibling)
    print('-> ', div.find_next_sibling())-> ')
    break

Output

->  

->  

->  <div >
    ...and some more needed text here!
  </div>

CodePudding user response:

The main point here in my opinion is that .find_next_sibling() scope is on next level on the tree.

While .next_element and .next_sibling scope is on the same level of the parse tree.

So take a look and print the name of the elements and you will see next element is not a tag, cause there is nothing on same level of the tree :

for div in soup.find_all('div', class_="one-ad-title"):
    print('-> ', div.next_element.name)
    print('-> ', div.next_sibling.name)
    print('-> ', div.find_next_sibling().name)

#output
->  None
->  None
->  div

So if you change your input to one line and no spaces,... between tags you got the following result:

from bs4 import BeautifulSoup

html = """
<div one-ad-desc"><div ><a  href="www this is the URL!"><h5>Text needed</h5></a></div><div >...and some more needed text here!</div></div></div>"""

soup = BeautifulSoup(html, 'lxml')

for div in soup.find_all('div', class_="one-ad-title"):
    print('-> ', div.next_element)
    print('-> ', div.next_sibling)
    print('-> ', div.find_next_sibling())

Output:

->  <a  href="www this is the URL!"><h5>Text needed</h5></a>
->  <div >...and some more needed text here!</div>
->  <div >...and some more needed text here!</div>

Note "text needed" is not in a sibling of your selected tag, it is in one of its children. To select "text needed" -> print('-> ', div.find_next().text)

  • Related