So im trying to grab all the reviews from yelp for the hotel: https://www.yelp.com/biz/capri-laguna-laguna-beach
I have my code below, but I'm unable to pull all the reviews.. I am only able to pull one.. can someone please assist?
I would ideally love to pull all the yelp reviews for this establishment
import time
import random
from bs4 import BeautifulSoup as bs
import urllib.request as url
html = urllib.request.urlopen('https://www.yelp.com/biz/capri-laguna-laguna-beach').read().decode('utf-8')
soup = bs(html, 'html.parser')
relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')
for div in relevant:
for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
text = html_class.find('span')
review = html_class.getText(
print(review)
CodePudding user response:
You are only getting one review printed because you wrote the print statement outside the loop.
relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')
for div in relevant:
for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
text = html_class.find('span')
review = html_class.getText()
print(review)
If you implement the code above, you'll get all 10 reviews printed.
To store all 10 reviews in a list do this,
reviews = []
for div in relevant:
for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
text = html_class.find('span')
review = html_class.getText()
reviews.append(review)
CodePudding user response:
Data is generating dynamically from API calls json response
as GET
method and You can extract data using only requests module.So,You can follow the next example. Total pages 50.6, per page 10 reviews and total reviews 506 and litle bit change for live updated
import requests
import pandas as pd
api_url = "https://www.yelp.com/biz/Yz7qwi0GipbeLBFAjSr_PQ/props"
data=[]
jsonData = requests.get(api_url).json()
for page in range(1,507,10):
jsonData['totalResults']=page
for item in jsonData['bizDetailsPageProps']['reviewFeedQueryProps']['reviews']:
review=item['comment']['text'].replace('<br><br>','').replace('<br>','')
data.append(review)
df =pd.DataFrame(data,columns=['Review'])
print(df)
Output:
Review
0 Wonderful trip to Laguna Beach, so grateful we...
1 We truly loved coming here for a weekend getaw...
2 Amazing quaint stay in the heart of laguna. Al...
3 Let me start with the positive by saying that ...
4 We picked this hotel based on the amazing phot...
.. ...
505 My husband and I absolutely LOVE coming to the...
506 Terrible customer service. I made a call to ge...
507 I don't know where to begin...I knew the l...
508 Absolutely give this hotel 10/10! Everything f...
509 My girlfriend and I just completed our stay he...
[510 rows x 1 columns]