Home > database >  BeautifulSoup - extracting text from multiple span elements w/o classes
BeautifulSoup - extracting text from multiple span elements w/o classes

Time:11-15

So that's how HTML looks:

<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>

I need to extract detail2 & detail3.

But with this piece of code I only get detail1.

info = data.find("p", class_ = "details").span.text

How do I extract the needed items?

Thanks in advance!

CodePudding user response:

Select your elements more specific in your case all sibling <span> of <span> with class number:

soup.select('span.number ~ span')

Example

from bs4 import BeautifulSoup
html='''<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>'''
soup = BeautifulSoup(html)

[t.text for t in soup.select('span.number ~ span')]

Output

['detail2', 'detail3']

CodePudding user response:

You can find all <span>s and do normal indexing:

from bs4 import BeautifulSoup

html_doc = """\
<p >
<span>detail1</span>
<span >1</span>
<span>detail2</span>
<span>detail3</span>
</p>"""

soup = BeautifulSoup(html_doc, "html.parser")

spans = soup.find("p", class_="details").find_all("span")

for s in spans[-2:]:
    print(s.text)

Prints:

detail2
detail3

Or CSS selectors:

spans = soup.select(".details span:nth-last-of-type(-n 2)")

for s in spans:
    print(s.text)

Prints:

detail2
detail3
  • Related