Home > database >  Pulling reviews from yelp - beautifulsoup
Pulling reviews from yelp - beautifulsoup

Time:04-15

So im trying to grab all the reviews from yelp for the hotel: https://www.yelp.com/biz/capri-laguna-laguna-beach

I have my code below, but I'm unable to pull all the reviews.. I am only able to pull one.. can someone please assist?

I would ideally love to pull all the yelp reviews for this establishment

import time
import random
from bs4 import BeautifulSoup as bs

import urllib.request as url


html = urllib.request.urlopen('https://www.yelp.com/biz/capri-laguna-laguna-beach').read().decode('utf-8')

soup = bs(html, 'html.parser')

relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

for div in relevant:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            text = html_class.find('span')
            review = html_class.getText(
            
print(review)

CodePudding user response:

You are only getting one review printed because you wrote the print statement outside the loop.

relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

for div in relevant:
  for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
    text = html_class.find('span')
    review = html_class.getText()          
    print(review)

If you implement the code above, you'll get all 10 reviews printed.


To store all 10 reviews in a list do this,

reviews = []

for div in relevant:
  for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
    text = html_class.find('span')
    review = html_class.getText()
    reviews.append(review)

CodePudding user response:

Data is generating dynamically from API calls json response as GET method and You can extract data using only requests module.So,You can follow the next example. Total pages 50.6, per page 10 reviews and total reviews 506 and litle bit change for live updated

import requests
import pandas as pd
api_url = "https://www.yelp.com/biz/Yz7qwi0GipbeLBFAjSr_PQ/props"
data=[]
jsonData = requests.get(api_url).json()
for page in range(1,507,10):
    jsonData['totalResults']=page

    for item in jsonData['bizDetailsPageProps']['reviewFeedQueryProps']['reviews']:
        review=item['comment']['text'].replace('<br><br>','').replace('<br>','')
        data.append(review)

df =pd.DataFrame(data,columns=['Review'])
print(df)

Output:

                            Review
0    Wonderful trip to Laguna Beach, so grateful we...
1    We truly loved coming here for a weekend getaw...
2    Amazing quaint stay in the heart of laguna. Al...
3    Let me start with the positive by saying that ...
4    We picked this hotel based on the amazing phot...
..                                                 ...
505  My husband and I absolutely LOVE coming to the...
506  Terrible customer service. I made a call to ge...
507  I don&#39;t know where to begin...I knew the l...
508  Absolutely give this hotel 10/10! Everything f...
509  My girlfriend and I just completed our stay he...

[510 rows x 1 columns]
  • Related