I am trying to write a simple Python scraper in order to save all the reviews of a specific place on TripAdvisor.
The specific link I am using as example is the following:
Here is the code I am using, that is supposed to print the relative html
:
from bs4 import BeautifulSoup
import requests
url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)
If I run this code in the console it stays pending on the requests.get(url)
for long without any output. Using another url (for example url = "https://stackoverflow.com/"
) I get immediately the html correctly displayed. Why is TripAdvisor not working? How can I manage to obtain its html?
CodePudding user response:
Adding an user-agent
should solve your issue in first step, cause some sites provides different content or use it for bot / automation detection - Open DEVTools in your browser an copy the user-agent from one of your requests:
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)
Example
from bs4 import BeautifulSoup
import requests
url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)
data = r.text
soup = BeautifulSoup(data)
data = []
for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
data.append({
'rating':e.select_one('svg[aria-label]')['aria-label'],
'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
'content':e.select_one('div:has(>a[tabindex="0"]) div div').text
})
data
Output
[{'rating': '5.0 of 5 bubbles',
'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out