I've tried so many different answers but nothing is working.
I am trying to scrap all reviews on the play store website and found that `class_ = "d15Mdf bAhLNe"` is the container I want but I get an empty list.
Also when I try soup.find_all({class : d15Mdf bAhLNe})
combination. x
The thing is that when I print soup I catch the HTML file. What I am missing?
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://play.google.com/store/apps/details?id=com.google.android.googlequicksearchbox&hl=en').text
soup = BeautifulSoup(html_text, 'lxml')
reviews = soup.find_all('div', class_="d15Mdf bAhLNe")
print(reviews)
``
CodePudding user response:
If you print out soup
instead of reviews
, you will see that the html content you got is different from the html content on the live website. Because you're not a browser, the script that create the content dynamically is not doing its job. See more detailed answer here:
I suggest you to look on this answer Here
Quick example using Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
# Config Change depending on your needs
options = Options()
options.binary_location = r"binary_path"
browser = webdriver.Firefox(options=options, executable_path="driver_path")
# Get the data
url = 'https://play.google.com/store/apps/details?id=com.google.android.googlequicksearchbox&hl=en'
browser.get(url)
res = browser.find_elements(By.XPATH, '//div[@]')
print(res)