Home > Software design >  Trustpilot web scrapping, Empty output
Trustpilot web scrapping, Empty output

Time:10-03

I am trying to scrap Trustpilot reviews using python. Since I'm new to web scrapping, I used the below code snippet to do it. I ran the code from Google colab and it was completed without any error but the output dataframe is empty! I checked different companies' URLs in Trustpilot but it keeps returning an empty dataframe. Could anyone help me to find out what is wrong with my code or methodology for scrapping?

from bs4 import BeautifulSoup
import requests
import pandas as pd
import datetime as dt

# Initialize lists
review_titles = []
review_dates_original = []
review_dates = []
review_ratings = []
review_texts = []
page_number = []

# Set Trustpilot page numbers to scrape here
from_page = 1
to_page = 50

for i in range(from_page, to_page   1):
    response = requests.get(f"https://uk.trustpilot.com/review/bupa.co.uk?page={i}")
    web_page = response.text
    soup = BeautifulSoup(web_page, "html.parser")

    for review in soup.find_all(class_ = "paper_paper__1PY90 paper_square__lJX8a card_card__lQWDv card_noPadding__D8PcU styles_cardWrapper__LcCPA styles_show__HUXRb styles_reviewCard__9HxJJ"):
        # Review titles
        review_title = review.find(class_ = "typography_typography__QgicV typography_h4__E971J typography_color-black__5LYEn typography_weight-regular__TWEnf typography_fontstyle-normal__kHyN3 styles_reviewTitle__04VGJ")
        review_titles.append(review_title.getText())

        # Review dates
        review_date_original = review.select_one(selector="time")
        review_dates_original.append(review_date_original.getText())

        # Convert review date texts into Python datetime objects
        review_date = review.select_one(selector="time").getText().replace("Updated ", "")
        if "hours ago" in review_date.lower() or "hour ago" in review_date.lower():
            review_date = dt.datetime.now().date()
        elif "a day ago" in review_date.lower():
            review_date = dt.datetime.now().date() - dt.timedelta(days=1)
        elif "days ago" in review_date.lower():
            review_date = dt.datetime.now().date() - dt.timedelta(days=int(review_date[0]))
        else:
            review_date = dt.datetime.strptime(review_date, "%b %d, %Y").date()
        review_dates.append(review_date)

        # Review ratings
        review_rating = review.find(class_ = "star-rating_starRating__4rrcf star-rating_medium__iN6Ty").findChild()
        review_ratings.append(review_rating["alt"])
        
        # When there is no review text, append "" instead of skipping so that data remains in sequence with other review data e.g. review_title
        review_text = review.find(class_ = "typography_typography__QgicV typography_body__9UBeQ typography_color-black__5LYEn typography_weight-regular__TWEnf typography_fontstyle-normal__kHyN3")
        if review_text == None:
            review_texts.append("")
        else:
            review_texts.append(review_text.getText())
        
        # Trustpilot page number
        page_number.append(i)

# Create final dataframe from lists
df_reviews = pd.DataFrame(list(zip(review_titles, review_dates_original, review_dates, review_ratings, review_texts, page_number)),
                columns =['review_title', 'review_date_original', 'review_date', 'review_rating', 'review_text', 'page_number'])

CodePudding user response:

That information is being fed into page from an API, via javascript XHR calls [those XHR calls are being made by Javascript after the initial page' HTML is loaded, so Requests cannot see the data, as it cannot execute Javascript]. You need to scrape that API endpoint (and you can find it in Dev tools - Network tab). Here is one way to do it:

import requests
import pandas as pd
from tqdm import tqdm ## if using Jupyter notebook, import as: from tqdm.notebook import tqdm


s = requests.Session()
big_df = pd.DataFrame()

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

for x in tqdm(range(2, 5)):
    r = s.get(f'https://uk.trustpilot.com/_next/data/businessunitprofile-consumersite-5670/review/bupa.co.uk.json?page={x}&businessUnit=bupa.co.uk')
    df = pd.json_normalize(r.json()['pageProps']['reviews'])
    big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print(big_df)

Result in terminal:

100%
1/1 [00:01<00:00, 1.21s/it]
id  filtered    pending text    rating  title   likes   report  hasUnhandledReports consumersReviewCountOnSameDomain    consumersReviewCountOnSameLocation  productReviews  language    location    labels.merged   labels.verification.isVerified  labels.verification.createdDateTime labels.verification.reviewSourceName    labels.verification.verificationSource  labels.verification.verificationLevel   dates.experiencedDate   dates.publishedDate dates.updatedDate   consumer.id consumer.displayName    consumer.imageUrl   consumer.numberOfReviews    consumer.countryCode    consumer.hasImage   consumer.isVerified reply.message   reply.publishedDate reply.updatedDate
0   633340161133a76521b9a54a    False   False   Having just received a breast cancer diagnosis from a routine mammogram. I was unfortunately let down by the NHS.dropped me out of the system.\nFortunately, I have had medical insurance with BUPA.for the past 48 yrs.So I contacted them & they were very instrumental in getting me the treatment I needed very quickly.    5   Let down by NHS.    1   None    False   1   None    []  en  None    None    True    2022-09-27T20:25:26.000Z    BusinessGeneratedLink   invitation  invited 2021-10-07T00:00:00.000Z    2022-09-27T20:25:26.000Z    None    573c3c860000ff000a212594    Hilary Davies       7   GB  False   False   Hi Hilary, thank you for sharing your experience. Wishing you all the very best. Amy    2022-09-28T08:50:31.797Z    None
1   6333f9061c24523f3ca038d2    False   False   I used the dental care plan and was so pleased with how quick and easy it was to put a claim through. After sending my receipt which included breakdown of dental work I received I received the funds straight into my nominated bank account within 10 days. No fuss or multiple emails/calls required. Super service!    5   Dental Care plan- super easy claim process  0   None    False   1   None    []  en  None    None    True    2022-09-28T09:34:30.000Z    BusinessGeneratedLink   invitation  invited 2022-09-14T00:00:00.000Z    2022-09-28T09:34:30.000Z    None    5e9b08555efa59374f58b372    Jessica     4   GB  False   False   Great, thanks for sharing your experience Jessica. Amy  2022-09-28T09:42:50.036Z    None
2   633322a61133a76521b983b9    False   False   Bupa left a text on my phone that they needed to speak to me about a medical issue. They had given me an authority number. \n\nI have tried six times to get through. No name was left. I couldn't find a human being to talk to. This is weeks ago. No one has tried again. The message was left on date below 1   Bupa deals with insurance and takes money every month. It shouldn't be so difficult dealing with them when it comes to paying for a medical issue.  0   None    False   1   None    []  en  None    None    True    2022-09-27T18:19:50.000Z    BusinessGeneratedLink   invitation  invited 2022-09-12T00:00:00.000Z    2022-09-27T18:19:50.000Z    None    58e7abeb0000ff000a8ac3aa    a levin     2   GB  False   False   Hi there, I'm really sorry about this, I've asked the team to give you a call to discuss the reason for the Text, you'll hear from them shortly. Thanks Amy 2022-09-28T08:37:11.614Z    None
3   633696ee3d107cfdfccef1fa    False   False   BUPA processed my request quickly and allowed me the care I needed however they raised my fee from £135 to £206 a month while I was in the middle of treatment so was unable to look elsewhere. 3   BUPA processed my request quickly   1   None    False   1   None    []  en  None    None    True    2022-09-30T09:12:46.000Z    BusinessGeneratedLink   invitation  invited 2022-07-28T00:00:00.000Z    2022-09-30T09:12:46.000Z    None    5f1b27d613b1ca629d80359d    Judy March      9   GB  False   False   Hi Judy. Thanks for your feedback. You may wish to give your Loyalty team a call on 0800 010 383 to discuss the cost of your policy in more detail, and to see if there any options open to you. Many thanks, Brian 2022-09-30T10:05:44.158Z    None
4   6332c3f5484ccb2c525e811c    False   False   All very straight forward when making a claim just make sure you talk to BUPA first to check your cover then get a private referral from a GP, go to the website and choose your consultant / Hospital and give them a call to arrange an appointment. I added my wife to my policy a couple of years ago and glad I did as she recently needed some quick diagnoses and treatment for heart condition. All sorted within 6 weeks of seeing the consultant. 5   Quick and simple from diagnoses to treatment.   0   None    False   1   None    []  en  None    None    True    2022-09-27T11:35:49.000Z    BusinessGeneratedLink   invitation  invited 2022-08-15T00:00:00.000Z    2022-09-27T11:35:49.000Z    None    5ea9d7c9bbf1afc1c1f68ef6    Philip      12  GB  False   False   Thanks for taking the time to review us Phillip, we really appreciate it. Amy   2022-09-27T11:50:14.607Z    None
5   6332da211c24523f3c9f71e5    False   False   Cover was not enough to complete my treatment. I appreciate that I pay a certain amount of cover however, there was no way I could complete treatment and consultations with the benefit received. Surely enough cover should be provided to complete treatment.    3   Insufficient cover  1   None    False   1   None    []  en  None    None    True    2022-09-27T13:10:25.000Z    BusinessGeneratedLink   invitation  invited 2022-08-28T00:00:00.000Z    2022-09-27T13:10:25.000Z    None    600ff7dd771237001951b6fd    Diane       3   GB  False   False   Hi Diane, thanks for your comments. Sorry you feel that your cover is insufficient. We would happily discuss your policy options with you at your renewal. Amy  2022-09-27T13:22:00.300Z    None
6   6332cd761133a76521b92128    False   False   As over the years, during my last contact with Bupa staff member to get authorisation for consultation, I did not experience any issue what so ever. The staff member was very helpful, knowledgeable and understanding. Mine is a complex which has not been professionally diagnosed but only I know what my issue is. Throughout I have had immense moral support on top of medical support which has been very valuable. \nUnderstanding of most staff members has been above and beyond expectations.  5   Bupa has been my life saver.    0   None    False   1   None    []  en  None    None    True    2022-09-27T12:16:22.000Z    BusinessGeneratedLink   invitation  invited 2022-09-27T00:00:00.000Z    2022-09-27T12:16:22.000Z    None    5992b48b0000ff000abbf4db    Manju       9   GB  False   False   Thanks for sharing your experience Manju, we really appreciate it. All the best, Amy.   2022-09-27T12:22:09.148Z    None
7   6332d0f31133a76521b924dd    False   False   Poor at best:\n\n1) Admin, confusion when mistakes are made by them, I have to sort their errors out.\n2) Changing terms and what I can claim for, meaning that I'm going through the NHS for me treatment.\n\nI can go on but pointless as you get my drift.   1   Poorest provider yet    0   None    False   1   None    []  en  None    None    True    2022-09-27T12:31:15.000Z    BusinessGeneratedLink   invitation  invited 2022-09-23T00:00:00.000Z    2022-09-27T12:31:15.000Z    None    5cb772a47fe10841711b6bea    Guido       4   GB  False   False   Hi Guido, I'm really sorry that this is how you feel after your recent experience. It looks like we have discussed your concerns, but if you would like to discuss this any further, or have any questions we'd be happy to help, call our Customer Relations team on 0345 606 6739. Amy    2022-09-27T12:57:42.082Z    None
8   63342e5c1133a76521ba2ec0    False   False   Am in Bupa with my employer.\nHaving various tests done and need \nBupa’s approval to go ahead.\nAt this stressful time for me they have been brilliant. Call handlers polite ever so friendly and extremely helpful.   5   Am in Bupa with my employer.    0   None    False   1   None    []  en  None    None    True    2022-09-28T13:22:04.000Z    BusinessGeneratedLink   invitation  invited 2022-09-28T00:00:00.000Z    2022-09-28T13:22:04.000Z    None    5c61a0faf7edcbadbf39934d    Peter Francis. Somerset     4   GB  False   False   Thank you for taking the time to review us Peter. Wishing you all the best with any further treatment. Amy  2022-09-29T09:01:26.925Z    None
9   6332ce3d1c24523f3c9f654f    False   False   Not great to be honest, it isn’t this stressful on the NHS. Very disappointing, constantly having to call them, I was under the impression paying private would make life easier, I couldn’t have been more wrong. Dreadful service. Paid more out in excess than what the treatment actually cost? 1   Not great to be honest  0   None    False   1   None    []  en  None    None    True    2022-09-27T12:19:41.000Z    BusinessGeneratedLink   invitation  invited 2022-09-27T00:00:00.000Z    2022-09-27T12:19:41.000Z    None    6332ce3c56accb0012f8db20    MJH     1   GB  False   False   Hi there, thank you for taking the time to share your experience. I can see that we've looked into your concerns and responded to you. If you feel that you would like to discuss this further or if you're unhappy with our response, your next steps will be noted on our correspondence. Amy 2022-09-27T13:01:57.551Z    None
10  6332ca3a1c24523f3c9f610e    False   False   As I get older I get more and more grateful that I have BUPA to contact if I need serious medical attention not readily available on the NHS. Their systems for processing inquiries and authorisations is very efficient. And I am very impressed by the BUPA hospital near me, the Cromwell.  5   As I get older I get more and more… 0   None    False   1   None    []  en  None    None    True    2022-09-27T12:02:34.000Z    BusinessGeneratedLink   invitation  invited 2022-09-15T00:00:00.000Z    2022-09-27T12:02:34.000Z    None    4fd6243600006400011afc2a    Ian Dunlop      12  GB  False   False   Hi Ian, thanks for taking the time to review us. We're really pleased this is how you feel after your recent experience. Thank you, Amy.    2022-09-27T12:19:55.899Z    None
11  6332e3161133a76521b938a7    False   False   Had a tooth            
  • Related