I am trying to pull the text (see image of element tree) of an element in the trust pilot website using the following script but all it returns is a bunch of 'None'.
url = "https://uk.trustpilot.com/review/rockar.com"
try_url = requests.get(url)
soup = BeautifulSoup(try_url.content, 'html.parser')
print(try_url.content)
for h in soup.find_all('div', {'class': 'styles_reviewContent__3TSDf'}):
hdln = h.find("h2")
print(hdln)
What is the way around this? am I looking at the wrong selector?
CodePudding user response:
As @diggusbickus pointed out, you can get the reviews this way:
data = json.loads(soup.find('script', type='application/json').string)
reviews = data["props"]["pageProps"]["reviews"]
sample_reply = reviews[0]["reply"]
The sample_reply
is
{'message': "Thank you so much for your kind words, Fi! It's great to hear Shah was fantastic and offer a personal service to your car buying journey. Thank you for taking the time to leave us a great review! We hope you love your new vehicle! Thanks again for choosing Rockar :-)",
'publishedDate': '2021-11-04T12:35:25.401Z',
'updatedDate': '2021-11-04T12:35:34.948Z'}