I am trying to scrap Trustpilot reviews using python. Since I'm new to web scrapping, I used the below code snippet to do it. I ran the code from Google colab and it was completed without any error but the output dataframe is empty! I checked different companies' URLs in Trustpilot but it keeps returning an empty dataframe. Could anyone help me to find out what is wrong with my code or methodology for scrapping?
from bs4 import BeautifulSoup
import requests
import pandas as pd
import datetime as dt
# Initialize lists
review_titles = []
review_dates_original = []
review_dates = []
review_ratings = []
review_texts = []
page_number = []
# Set Trustpilot page numbers to scrape here
from_page = 1
to_page = 50
for i in range(from_page, to_page 1):
response = requests.get(f"https://uk.trustpilot.com/review/bupa.co.uk?page={i}")
web_page = response.text
soup = BeautifulSoup(web_page, "html.parser")
for review in soup.find_all(class_ = "paper_paper__1PY90 paper_square__lJX8a card_card__lQWDv card_noPadding__D8PcU styles_cardWrapper__LcCPA styles_show__HUXRb styles_reviewCard__9HxJJ"):
# Review titles
review_title = review.find(class_ = "typography_typography__QgicV typography_h4__E971J typography_color-black__5LYEn typography_weight-regular__TWEnf typography_fontstyle-normal__kHyN3 styles_reviewTitle__04VGJ")
review_titles.append(review_title.getText())
# Review dates
review_date_original = review.select_one(selector="time")
review_dates_original.append(review_date_original.getText())
# Convert review date texts into Python datetime objects
review_date = review.select_one(selector="time").getText().replace("Updated ", "")
if "hours ago" in review_date.lower() or "hour ago" in review_date.lower():
review_date = dt.datetime.now().date()
elif "a day ago" in review_date.lower():
review_date = dt.datetime.now().date() - dt.timedelta(days=1)
elif "days ago" in review_date.lower():
review_date = dt.datetime.now().date() - dt.timedelta(days=int(review_date[0]))
else:
review_date = dt.datetime.strptime(review_date, "%b %d, %Y").date()
review_dates.append(review_date)
# Review ratings
review_rating = review.find(class_ = "star-rating_starRating__4rrcf star-rating_medium__iN6Ty").findChild()
review_ratings.append(review_rating["alt"])
# When there is no review text, append "" instead of skipping so that data remains in sequence with other review data e.g. review_title
review_text = review.find(class_ = "typography_typography__QgicV typography_body__9UBeQ typography_color-black__5LYEn typography_weight-regular__TWEnf typography_fontstyle-normal__kHyN3")
if review_text == None:
review_texts.append("")
else:
review_texts.append(review_text.getText())
# Trustpilot page number
page_number.append(i)
# Create final dataframe from lists
df_reviews = pd.DataFrame(list(zip(review_titles, review_dates_original, review_dates, review_ratings, review_texts, page_number)),
columns =['review_title', 'review_date_original', 'review_date', 'review_rating', 'review_text', 'page_number'])
CodePudding user response:
That information is being fed into page from an API, via javascript XHR calls [those XHR calls are being made by Javascript after the initial page' HTML is loaded, so Requests cannot see the data, as it cannot execute Javascript]. You need to scrape that API endpoint (and you can find it in Dev tools - Network tab). Here is one way to do it:
import requests
import pandas as pd
from tqdm import tqdm ## if using Jupyter notebook, import as: from tqdm.notebook import tqdm
s = requests.Session()
big_df = pd.DataFrame()
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
for x in tqdm(range(2, 5)):
r = s.get(f'https://uk.trustpilot.com/_next/data/businessunitprofile-consumersite-5670/review/bupa.co.uk.json?page={x}&businessUnit=bupa.co.uk')
df = pd.json_normalize(r.json()['pageProps']['reviews'])
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print(big_df)
Result in terminal:
100%
1/1 [00:01<00:00, 1.21s/it]
id filtered pending text rating title likes report hasUnhandledReports consumersReviewCountOnSameDomain consumersReviewCountOnSameLocation productReviews language location labels.merged labels.verification.isVerified labels.verification.createdDateTime labels.verification.reviewSourceName labels.verification.verificationSource labels.verification.verificationLevel dates.experiencedDate dates.publishedDate dates.updatedDate consumer.id consumer.displayName consumer.imageUrl consumer.numberOfReviews consumer.countryCode consumer.hasImage consumer.isVerified reply.message reply.publishedDate reply.updatedDate
0 633340161133a76521b9a54a False False Having just received a breast cancer diagnosis from a routine mammogram. I was unfortunately let down by the NHS.dropped me out of the system.\nFortunately, I have had medical insurance with BUPA.for the past 48 yrs.So I contacted them & they were very instrumental in getting me the treatment I needed very quickly. 5 Let down by NHS. 1 None False 1 None [] en None None True 2022-09-27T20:25:26.000Z BusinessGeneratedLink invitation invited 2021-10-07T00:00:00.000Z 2022-09-27T20:25:26.000Z None 573c3c860000ff000a212594 Hilary Davies 7 GB False False Hi Hilary, thank you for sharing your experience. Wishing you all the very best. Amy 2022-09-28T08:50:31.797Z None
1 6333f9061c24523f3ca038d2 False False I used the dental care plan and was so pleased with how quick and easy it was to put a claim through. After sending my receipt which included breakdown of dental work I received I received the funds straight into my nominated bank account within 10 days. No fuss or multiple emails/calls required. Super service! 5 Dental Care plan- super easy claim process 0 None False 1 None [] en None None True 2022-09-28T09:34:30.000Z BusinessGeneratedLink invitation invited 2022-09-14T00:00:00.000Z 2022-09-28T09:34:30.000Z None 5e9b08555efa59374f58b372 Jessica 4 GB False False Great, thanks for sharing your experience Jessica. Amy 2022-09-28T09:42:50.036Z None
2 633322a61133a76521b983b9 False False Bupa left a text on my phone that they needed to speak to me about a medical issue. They had given me an authority number. \n\nI have tried six times to get through. No name was left. I couldn't find a human being to talk to. This is weeks ago. No one has tried again. The message was left on date below 1 Bupa deals with insurance and takes money every month. It shouldn't be so difficult dealing with them when it comes to paying for a medical issue. 0 None False 1 None [] en None None True 2022-09-27T18:19:50.000Z BusinessGeneratedLink invitation invited 2022-09-12T00:00:00.000Z 2022-09-27T18:19:50.000Z None 58e7abeb0000ff000a8ac3aa a levin 2 GB False False Hi there, I'm really sorry about this, I've asked the team to give you a call to discuss the reason for the Text, you'll hear from them shortly. Thanks Amy 2022-09-28T08:37:11.614Z None
3 633696ee3d107cfdfccef1fa False False BUPA processed my request quickly and allowed me the care I needed however they raised my fee from £135 to £206 a month while I was in the middle of treatment so was unable to look elsewhere. 3 BUPA processed my request quickly 1 None False 1 None [] en None None True 2022-09-30T09:12:46.000Z BusinessGeneratedLink invitation invited 2022-07-28T00:00:00.000Z 2022-09-30T09:12:46.000Z None 5f1b27d613b1ca629d80359d Judy March 9 GB False False Hi Judy. Thanks for your feedback. You may wish to give your Loyalty team a call on 0800 010 383 to discuss the cost of your policy in more detail, and to see if there any options open to you. Many thanks, Brian 2022-09-30T10:05:44.158Z None
4 6332c3f5484ccb2c525e811c False False All very straight forward when making a claim just make sure you talk to BUPA first to check your cover then get a private referral from a GP, go to the website and choose your consultant / Hospital and give them a call to arrange an appointment. I added my wife to my policy a couple of years ago and glad I did as she recently needed some quick diagnoses and treatment for heart condition. All sorted within 6 weeks of seeing the consultant. 5 Quick and simple from diagnoses to treatment. 0 None False 1 None [] en None None True 2022-09-27T11:35:49.000Z BusinessGeneratedLink invitation invited 2022-08-15T00:00:00.000Z 2022-09-27T11:35:49.000Z None 5ea9d7c9bbf1afc1c1f68ef6 Philip 12 GB False False Thanks for taking the time to review us Phillip, we really appreciate it. Amy 2022-09-27T11:50:14.607Z None
5 6332da211c24523f3c9f71e5 False False Cover was not enough to complete my treatment. I appreciate that I pay a certain amount of cover however, there was no way I could complete treatment and consultations with the benefit received. Surely enough cover should be provided to complete treatment. 3 Insufficient cover 1 None False 1 None [] en None None True 2022-09-27T13:10:25.000Z BusinessGeneratedLink invitation invited 2022-08-28T00:00:00.000Z 2022-09-27T13:10:25.000Z None 600ff7dd771237001951b6fd Diane 3 GB False False Hi Diane, thanks for your comments. Sorry you feel that your cover is insufficient. We would happily discuss your policy options with you at your renewal. Amy 2022-09-27T13:22:00.300Z None
6 6332cd761133a76521b92128 False False As over the years, during my last contact with Bupa staff member to get authorisation for consultation, I did not experience any issue what so ever. The staff member was very helpful, knowledgeable and understanding. Mine is a complex which has not been professionally diagnosed but only I know what my issue is. Throughout I have had immense moral support on top of medical support which has been very valuable. \nUnderstanding of most staff members has been above and beyond expectations. 5 Bupa has been my life saver. 0 None False 1 None [] en None None True 2022-09-27T12:16:22.000Z BusinessGeneratedLink invitation invited 2022-09-27T00:00:00.000Z 2022-09-27T12:16:22.000Z None 5992b48b0000ff000abbf4db Manju 9 GB False False Thanks for sharing your experience Manju, we really appreciate it. All the best, Amy. 2022-09-27T12:22:09.148Z None
7 6332d0f31133a76521b924dd False False Poor at best:\n\n1) Admin, confusion when mistakes are made by them, I have to sort their errors out.\n2) Changing terms and what I can claim for, meaning that I'm going through the NHS for me treatment.\n\nI can go on but pointless as you get my drift. 1 Poorest provider yet 0 None False 1 None [] en None None True 2022-09-27T12:31:15.000Z BusinessGeneratedLink invitation invited 2022-09-23T00:00:00.000Z 2022-09-27T12:31:15.000Z None 5cb772a47fe10841711b6bea Guido 4 GB False False Hi Guido, I'm really sorry that this is how you feel after your recent experience. It looks like we have discussed your concerns, but if you would like to discuss this any further, or have any questions we'd be happy to help, call our Customer Relations team on 0345 606 6739. Amy 2022-09-27T12:57:42.082Z None
8 63342e5c1133a76521ba2ec0 False False Am in Bupa with my employer.\nHaving various tests done and need \nBupa’s approval to go ahead.\nAt this stressful time for me they have been brilliant. Call handlers polite ever so friendly and extremely helpful. 5 Am in Bupa with my employer. 0 None False 1 None [] en None None True 2022-09-28T13:22:04.000Z BusinessGeneratedLink invitation invited 2022-09-28T00:00:00.000Z 2022-09-28T13:22:04.000Z None 5c61a0faf7edcbadbf39934d Peter Francis. Somerset 4 GB False False Thank you for taking the time to review us Peter. Wishing you all the best with any further treatment. Amy 2022-09-29T09:01:26.925Z None
9 6332ce3d1c24523f3c9f654f False False Not great to be honest, it isn’t this stressful on the NHS. Very disappointing, constantly having to call them, I was under the impression paying private would make life easier, I couldn’t have been more wrong. Dreadful service. Paid more out in excess than what the treatment actually cost? 1 Not great to be honest 0 None False 1 None [] en None None True 2022-09-27T12:19:41.000Z BusinessGeneratedLink invitation invited 2022-09-27T00:00:00.000Z 2022-09-27T12:19:41.000Z None 6332ce3c56accb0012f8db20 MJH 1 GB False False Hi there, thank you for taking the time to share your experience. I can see that we've looked into your concerns and responded to you. If you feel that you would like to discuss this further or if you're unhappy with our response, your next steps will be noted on our correspondence. Amy 2022-09-27T13:01:57.551Z None
10 6332ca3a1c24523f3c9f610e False False As I get older I get more and more grateful that I have BUPA to contact if I need serious medical attention not readily available on the NHS. Their systems for processing inquiries and authorisations is very efficient. And I am very impressed by the BUPA hospital near me, the Cromwell. 5 As I get older I get more and more… 0 None False 1 None [] en None None True 2022-09-27T12:02:34.000Z BusinessGeneratedLink invitation invited 2022-09-15T00:00:00.000Z 2022-09-27T12:02:34.000Z None 4fd6243600006400011afc2a Ian Dunlop 12 GB False False Hi Ian, thanks for taking the time to review us. We're really pleased this is how you feel after your recent experience. Thank you, Amy. 2022-09-27T12:19:55.899Z None
11 6332e3161133a76521b938a7 False False Had a tooth