Home > front end >  How to select multiple children from HTML tag with Python/BeautifulSoup if exists?
How to select multiple children from HTML tag with Python/BeautifulSoup if exists?

Time:01-17

I'm currently scraping elements from a webpage. Let's say i'm iterating over a HTML reponse and a part of that response looks like this:

<div >
<div >
<div >
<span  title="SLT-4 2435">
<img src="/images/train-material/mat_slt4.png"/> </span>
<span  title="SLT-6 2631">
<img src="/images/train-material/mat_slt6.png"/> </span>
</div>
</div>
</div>

I know I can access the first element under title within the span class like so:

row[-1].find('span')['title']
"SLT-4 2435

But I would like to select the second title under the span class (if it exists) as a string too, like so: "SLT-4 2435, SLT-6 2631"

Any ideas?

CodePudding user response:

You can use the find_all() function to find all the span elements with class material-part

titles = []
for material_part in row[-1].find_all('span', class_='material-part'):
    titles.append(material_part['title'])
result = ', '.join(titles)

CodePudding user response:

In alternativ to find() / find_all() you could use css selectors:

soup.select('span.material-part[title]')

,iterate the ResultSet with list comprehension and join() your texts to a single string:

','.join([t.get('title') for t in soup.select('span.material-part[title]')])

Example

from bs4 import BeautifulSoup
html = '''<div >
<div >
<div >
<span  title="SLT-4 2435">
<img src="/images/train-material/mat_slt4.png"/> </span>
<span  title="SLT-6 2631">
<img src="/images/train-material/mat_slt6.png"/> </span>
</div>
</div>
</div>'''
soup = BeautifulSoup(html)

','.join([t.get('title') for t in soup.select('span.material-part[title]')])

Output

SLT-4 2435,SLT-6 2631
  • Related