I want to scrap some Hotel data in Tripadvisor but I cant get the data from "Hotel Class". How can I do that?
Code:
import requests
from bs4 import BeautifulSoup
import re
import time
import datetime
url = 'https://www.tripadvisor.com/Hotel_Review-g60763-d93543-Reviews-The_Shelburne_Sonesta_New_York-New_York_City_New_York.html'
headers = {'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
site = requests.get(url, headers=headers)
soup = BeautifulSoup(site.content, 'html.parser')
name = soup.find('h1',{'id':'HEADING'}).text
address = soup.find('div',{'class':'gZwVG S4 H3 f u ERCyA'}).text
hotel_class = soup.find('div',{'class':'euDRl _R MC S4 _a H'}).text
no_reviews = soup.find('span',{'class':'biGQs _P pZUbB biKBZ KxBGd'}).text if soup.find('span',{'class':'biGQs _P pZUbB biKBZ KxBGd'}) else ""
ct = datetime.datetime.now()
dt_string = ct.strftime("%d/%m/%Y %H:%M:%S")
print(name, address, hotel_class, no_reviews, dt_string)
CodePudding user response:
Not sure if it is the expected output, but you could scrape the value of attribute aria-label
from the <svg>
:
hotel_class = soup.find('div',{'class':'euDRl _R MC S4 _a H'}).svg.get('aria-label')
Output:
4.0 of 5 bubbles