So that's how HTML looks:
<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>
I need to extract detail2 & detail3.
But with this piece of code I only get detail1.
info = data.find("p", class_ = "details").span.text
How do I extract the needed items?
Thanks in advance!
CodePudding user response:
Select your elements more specific in your case all sibling <span>
of <span>
with class
number:
soup.select('span.number ~ span')
Example
from bs4 import BeautifulSoup
html='''<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>'''
soup = BeautifulSoup(html)
[t.text for t in soup.select('span.number ~ span')]
Output
['detail2', 'detail3']
CodePudding user response:
You can find all <span>
s and do normal indexing:
from bs4 import BeautifulSoup
html_doc = """\
<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>"""
soup = BeautifulSoup(html_doc, "html.parser")
spans = soup.find("p", class_="details").find_all("span")
for s in spans[-2:]:
print(s.text)
Prints:
detail2
detail3
Or CSS selectors:
spans = soup.select(".details span:nth-last-of-type(-n 2)")
for s in spans:
print(s.text)
Prints:
detail2
detail3