I am trying to extract the number of players data from this site - https://boardgamegeek.com/boardgame/174430/gloomhaven/stats.
from bs4 import BeautifulSoup as bs
import requests
url2 = "https://boardgamegeek.com/boardgame/174430/gloomhaven"
page3 = requests.get(url2)
s2 = bs(page3.content,"html.parser")
var2 = s2.find_all('span',{'class':'ng-scope ng-isolate-scope'})
When I try to use this code, it always returns an empty list at var2. I even tried to access the 'div' class that the 'span' is a part of, but I still get an empty list. Why is this?
Thanks in advance.
CodePudding user response:
The url is loaded dynamically by javascript. If you make disabled javascript from your browser then you will notice that the content from the url goes disappeared that's why you are getting an empty list at var2 because BeautifulSoup can't gab data so you need an automation tool something like selenium. Here I use selenium with BeautifulSoup.
As 'class':'ng-scope ng-isolate-scope'
selects only one element so you need to call find
method.
Script
from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
driver.maximize_window()
time.sleep(8)
url = 'https://boardgamegeek.com/boardgame/174430/gloomhaven/stats'
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'lxml')
var2 = soup.find('span',{'class':'ng-scope ng-isolate-scope'}).text
print(var2)
Output
1–4