Home > Software engineering >  How to get the text from the last span with class 'star fill'?
How to get the text from the last span with class 'star fill'?

Time:01-12

I am trying to scrape a website using BeautifulSoup. and I am having trouble getting the ratings from a review. They are stored in a table that has a span tag with last class 'star fill'.

seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')

print (seatcomfort)

<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>

print (Value For Money)

<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>

print (Inflight Entertainment)

<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>

I hope to get 1 for Value for money , 2 for for value for money, and 3 for inflight entertainment

CodePudding user response:

Question needs some improvment (fromatting, initial HTML or url) so this should only point into direction.

Select your elements with class star fill and get len() of ResultSet

len(soup.select('.review-rating-stars span.star.fill'))

or extract the text of the last element:

soup.select('.review-rating-stars span.star.fill')[-1].text

To store structured data use a dict:

{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}

Example

from bs4 import BeautifulSoup
html = '''
<table >
<tbody><tr>
    <td >Food &amp; Beverages</td>
    <td >
        <span >1</span><span >2</span><span >3</span><span >4</span><span >5</span>                                              </td>
</tr>
                                            <tr>
    <td >Inflight Entertainment</td>
    <td >
        <span >1</span><span >2</span><span >3</span><span >4</span><span >5</span>                                              </td>
</tr>
                                            <tr>
    <td >Seat Comfort</td>
    <td >
        <span >1</span><span >2</span><span >3</span><span >4</span><span >5</span>                                              </td>
</tr>
                                            <tr>
    <td >Staff Service</td>
    <td >
        <span >1</span><span >2</span><span >3</span><span >4</span><span >5</span>                                              </td>
</tr>
                                            <tr>
    <td >Value for Money</td>
    <td >
        <span >1</span><span >2</span><span >3</span><span >4</span><span >5</span>                                              </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)

{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}

Output

{'Food & Beverages': 3,
 'Inflight Entertainment': 3,
 'Seat Comfort': 3,
 'Staff Service': 3,
 'Value for Money': 3}
  • Related