I am trying to scrape a website using BeautifulSoup
. and I am having trouble getting the ratings from a review. They are stored in a table that has a span
tag with last class 'star fill'
.
seatcomfort = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Value For Money = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
Inflight Entertainment = Ratings.select_one('tr:has(td:first-child:-soup-contains("Seat Comfort")) td.review-rating-stars.stars, span.star fill')
print (seatcomfort)
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
print (Value For Money)
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
print (Inflight Entertainment)
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
<td ><span >1</span><span >2</span><span >3</span><span >4</span><span >5</span></td>
I hope to get 1 for Value for money
, 2 for for value for money
, and 3 for inflight entertainment
CodePudding user response:
Question needs some improvment (fromatting, initial HTML or url) so this should only point into direction.
Select your elements with class star fill
and get len()
of ResultSet
len(soup.select('.review-rating-stars span.star.fill'))
or extract the text of the last element:
soup.select('.review-rating-stars span.star.fill')[-1].text
To store structured data use a dict
:
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
Example
from bs4 import BeautifulSoup
html = '''
<table >
<tbody><tr>
<td >Food & Beverages</td>
<td >
<span >1</span><span >2</span><span >3</span><span >4</span><span >5</span> </td>
</tr>
<tr>
<td >Inflight Entertainment</td>
<td >
<span >1</span><span >2</span><span >3</span><span >4</span><span >5</span> </td>
</tr>
<tr>
<td >Seat Comfort</td>
<td >
<span >1</span><span >2</span><span >3</span><span >4</span><span >5</span> </td>
</tr>
<tr>
<td >Staff Service</td>
<td >
<span >1</span><span >2</span><span >3</span><span >4</span><span >5</span> </td>
</tr>
<tr>
<td >Value for Money</td>
<td >
<span >1</span><span >2</span><span >3</span><span >4</span><span >5</span> </td>
</tr></tbody></table>
'''
soup = BeautifulSoup(html)
{e.td.text:len(e.select('.star.fill')) for e in soup.select('table.review-ratings tr')}
Output
{'Food & Beverages': 3,
'Inflight Entertainment': 3,
'Seat Comfort': 3,
'Staff Service': 3,
'Value for Money': 3}