How to select div including a span with specific id via beautifulsoup?-CodePudding

I want to scrap a text of some div that includes a span with specific id or class.

for example:

<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span2"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span3"></span>
  text to scarp
</div>

I want to get the text in the div (class1) but specifically only the one witch include span (span1)

thanks

CodePudding user response：

this should do

soup.select_one('.class1:has(#span1)').text

CodePudding user response：

As Beso mentioned there is an typo in your html that should be fixed.

How to select?

Simplest approache in my opinion is to use a css selector to select all <div> that have a <span> with class named "span1" (cause there is more than one in your example html)

soup.select('div:has(> #span1)')

or even more specific as mentioned by diggusbickus:

soup.select('.class1:has(> #span1)')

To get the text of all of them you have to iterate the result set:

[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]

That will give you a list of texts:

['text to scarp', 'text to scarp']

Example

from bs4 import BeautifulSoup

html='''
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span2"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span1"></span>
  text to scarp
</div>
<div class="class1" > 
  <span id="span3"></span>
  text to scarp
</div>
'''
soup = BeautifulSoup(html, "html.parser")

[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]