Home > Software design >  Take the star rating from html page using beautifulsoup
Take the star rating from html page using beautifulsoup

Time:04-16

I am trying to take the star rating from this page (Screenshot of the tags from webpage

Here is the code that I was using however when it comes to taking the said tags it doesn't work

data = []
ua = UserAgent()
header = {'User-Agent':str(ua.safari)}
url = 'https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/'
response = requests.get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'lxml')
content_list = html_soup.find_all('div', attrs={'class': 'review-item'})
for e in content_list:

  d = {'review_title': e.a.text,
                'review_content': e.select_one('p').text,
                'overall_rating': e.select_one('span.sr-only').text,
                'reviewer_name':e.div.text.split(',')[0].strip(),
                'review_date':e.div.text.split(',')[1].strip(),
                 
              }

  data.append(d)
df = pd.DataFrame(data)
df1 = df.drop_duplicates(subset=['reviewer_name', 'review_title'], keep='first')

Basically what I would like to achieve is to have columns for each of those star ratings like for example Safety: 5.0, Performance: 5.0, Comfort: 5.0 and so on.

I was trying to use this part of the code:

d.update(dict(s.stripped_strings for s in e.select('span.rating-stars span.sr-only')))
data.append(d)

However it doesn't work. Moreover the tag that contains overall star rating and the detailed star rating has the same class as its the difference is that those two tags are under different tags (I hope I didn't complicate it too much). Anyways I hope someone could help me with that.

EDIT I edited a code a bit because it seems like the one I pasted didn't work which is strange

CodePudding user response:

In general it would be quiet possible to use stripped_strings with correct selection of elements:

d.update(dict(s.stripped_strings for s in e.select('dl')))

Due to your expected output I would recommend to pick strings for key and value separatly:

...
d.update({s.dt.text:float(s.dd.text.split()[0]) for s in e.select('dl')})

data.append(d)
...

This would update your dict with:

{'Safety': 5.0, 'Technology': 5.0, 'Performance': 5.0, 'Interior': 5.0, 'Comfort': 5.0, 'Reliability': 5.0, 'Value': 5.0}

or in case that there is no ResultSet with an empty dict.

  • Related