Home > Mobile >  Python web-scraper not working for TripAdvisor
Python web-scraper not working for TripAdvisor

Time:04-21

I am trying to write a simple Python scraper in order to save all the reviews of a specific place on TripAdvisor.

The specific link I am using as example is the following:

https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html

Here is the code I am using, that is supposed to print the relative html:

from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)

If I run this code in the console it stays pending on the requests.get(url) for long without any output. Using another url (for example url = "https://stackoverflow.com/") I get immediately the html correctly displayed. Why is TripAdvisor not working? How can I manage to obtain its html?

CodePudding user response:

Adding an user-agent should solve your issue in first step, cause some sites provides different content or use it for bot / automation detection - Open DEVTools in your browser an copy the user-agent from one of your requests:

headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)

Example

from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
headers = {'User-Agent': 'Mozilla/5.0'}

r = requests.get(url,headers=headers)
data = r.text
soup = BeautifulSoup(data)
data = []

for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
    data.append({
        'rating':e.select_one('svg[aria-label]')['aria-label'],
        'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
        'content':e.select_one('div:has(>a[tabindex="0"])   div   div').text
    })

data

Output

[{'rating': '5.0 of 5 bubbles',
  'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
  'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out            
  • Related