I am trying to take the star rating from this page (
Here is the code that I was using however when it comes to taking the said tags it doesn't work
data = []
ua = UserAgent()
header = {'User-Agent':str(ua.safari)}
url = 'https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/'
response = requests.get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'lxml')
content_list = html_soup.find_all('div', attrs={'class': 'review-item'})
for e in content_list:
d = {'review_title': e.a.text,
'review_content': e.select_one('p').text,
'overall_rating': e.select_one('span.sr-only').text,
'reviewer_name':e.div.text.split(',')[0].strip(),
'review_date':e.div.text.split(',')[1].strip(),
}
data.append(d)
df = pd.DataFrame(data)
df1 = df.drop_duplicates(subset=['reviewer_name', 'review_title'], keep='first')
Basically what I would like to achieve is to have columns for each of those star ratings like for example Safety: 5.0, Performance: 5.0, Comfort: 5.0 and so on.
I was trying to use this part of the code:
d.update(dict(s.stripped_strings for s in e.select('span.rating-stars span.sr-only')))
data.append(d)
However it doesn't work. Moreover the tag that contains overall star rating and the detailed star rating has the same class as its the difference is that those two tags are under different tags (I hope I didn't complicate it too much). Anyways I hope someone could help me with that.
EDIT I edited a code a bit because it seems like the one I pasted didn't work which is strange
CodePudding user response:
In general it would be quiet possible to use stripped_strings
with correct selection of elements:
d.update(dict(s.stripped_strings for s in e.select('dl')))
Due to your expected output I would recommend to pick strings for key
and value
separatly:
...
d.update({s.dt.text:float(s.dd.text.split()[0]) for s in e.select('dl')})
data.append(d)
...
This would update your dict
with:
{'Safety': 5.0, 'Technology': 5.0, 'Performance': 5.0, 'Interior': 5.0, 'Comfort': 5.0, 'Reliability': 5.0, 'Value': 5.0}
or in case that there is no ResultSet
with an empty dict
.