Home > OS >  fin_all div that includes a span with a specific id (beautifulsoup)
fin_all div that includes a span with a specific id (beautifulsoup)

Time:11-15

I want to scarp a text of some div that includes a span with specific id or class.

for example:

<div class="class1" > 
  <span id="span1"></span>
  text to scarp
<div/>
<div class="class1" > 
  <span id="span2"></span>
  text to scarp
<div/>
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
<div/>
<div class="class1" > 
  <span id="span3"></span>
  text to scarp
<div/>

I want to get the text in the div (class1) but specifically only the one witch include span (span1)

thanks

CodePudding user response:

this should do

soup.select_one('.class1:has(#span1)').text

CodePudding user response:

As Beso mentioned there is an typo in your html that should be fixed.

How to select?

Simplest approaches in my opinion is to use a css selector to select all <div> that have a <span> with class named "span1"

soup.select('div:has(> #span1)')

To get the text of all of them you have to iterate the result set:

[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]

That will give you a list of texts:

['text to scarp', 'text to scarp']

Example

from bs4 import BeautifulSoup

html='''
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span2"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span3"></span>
  text to scarp
</div>
'''
soup = BeautifulSoup(html, "html.parser")

[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]
  • Related