I want to scrap a text of some div
that includes a span
with specific id
or class
.
for example:
<div class="class1" >
<span id="span1"></span>
text to scarp
</div>
<div class="class1" >
<span id="span2"></span>
text to scarp
</div>
<div class="class1" >
<span id="span1"></span>
text to scarp
</div>
<div class="class1" >
<span id="span3"></span>
text to scarp
</div>
I want to get the text in the div (class1) but specifically only the one witch include span (span1)
thanks
CodePudding user response:
this should do
soup.select_one('.class1:has(#span1)').text
CodePudding user response:
As Beso mentioned there is an typo in your html
that should be fixed.
How to select?
Simplest approache in my opinion is to use a css selector
to select all <div>
that have a <span>
with class
named "span1" (cause there is more than one in your example html)
soup.select('div:has(> #span1)')
or even more specific as mentioned by diggusbickus:
soup.select('.class1:has(> #span1)')
To get the text of all of them you have to iterate the result set:
[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]
That will give you a list of texts:
['text to scarp', 'text to scarp']
Example
from bs4 import BeautifulSoup
html='''
<div class="class1" >
<span id="span1"></span>
text to scarp
</div>
<div class="class1" >
<span id="span2"></span>
text to scarp
</div>
<div class="class1" >
<span id="span1"></span>
text to scarp
</div>
<div class="class1" >
<span id="span3"></span>
text to scarp
</div>
'''
soup = BeautifulSoup(html, "html.parser")
[x.get_text(strip=True) for x in soup.select('div:has(> #span1)')]