I struggling with usage of next_sibling
(and similarly with next_element
). If used as attributes I don't get anything back but if used as find_next_sibling
(or find_next
) then it works.
From the doc:
find_next_sibling
: "Iterate over the rest of an element’s siblings in the tree. [...] returns the first one (of the match)"find_next
: "These methods use .next_elements to iterate over [...] and returns the first one"
So, find_next_sibling
depends on next_siblings
. On what does next_sibling
depends on and why do they return nothing?
from bs4 import BeautifulSoup
html = """
<div one-ad-desc">
<div >
<a href="www this is the URL!">
<h5>
Text needed
</h5>
</a>
</div>
<div >
...and some more needed text here!
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element)
print('-> ', div.next_sibling)
print('-> ', div.find_next_sibling())-> ')
break
Output
->
->
-> <div >
...and some more needed text here!
</div>
CodePudding user response:
The main point here in my opinion is that .find_next_sibling()
scope is on next level on the tree.
While .next_element
and .next_sibling
scope is on the same level of the parse tree.
So take a look and print the name of the elements and you will see next element is not a tag, cause there is nothing on same level of the tree :
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element.name)
print('-> ', div.next_sibling.name)
print('-> ', div.find_next_sibling().name)
#output
-> None
-> None
-> div
So if you change your input to one line and no spaces,... between tags you got the following result:
from bs4 import BeautifulSoup
html = """
<div one-ad-desc"><div ><a href="www this is the URL!"><h5>Text needed</h5></a></div><div >...and some more needed text here!</div></div></div>"""
soup = BeautifulSoup(html, 'lxml')
for div in soup.find_all('div', class_="one-ad-title"):
print('-> ', div.next_element)
print('-> ', div.next_sibling)
print('-> ', div.find_next_sibling())
Output:
-> <a href="www this is the URL!"><h5>Text needed</h5></a>
-> <div >...and some more needed text here!</div>
-> <div >...and some more needed text here!</div>
Note "text needed" is not in a sibling of your selected tag, it is in one of its children. To select "text needed" -> print('-> ', div.find_next().text)