Extracting content from webpage using BeautifulSoup-CodePudding

I am working on scrapping numbers from the Powerball website with the code below. However, numbers keeps coming back empty. Why is this?

import requests

from bs4 import BeautifulSoup

url = 'https://www.powerball.com/games/home'

page = requests.get(url).text


bsPage = BeautifulSoup(page)

numbers = bsPage.find_all("div", class_="field_numbers")

numbers

CodePudding user response：

Can confirm @Teprr is absolutely correct. You'll need to download chrome and add chromedriver.exe to your system path for this to work but the following code gets what you are looking for. You can use other browsers too you just need their respective driver.

from bs4 import BeautifulSoup
from selenium import webdriver
import time
url = 'https://www.powerball.com/games/home'
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(options=options)
browser.get(url)
time.sleep(3) # wait three seconds for all the js to happen
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser')
draws = soup.findAll("div", {"class":"number-card"})
print(draws)
for d in draws:
    info = d.find("div",{"class":"field_draw_date"}).getText()
    balls = d.find("div",{"class":"field_numbers"}).findAll("div",{"class":"numbers-ball"})
    numbers = [ball.getText() for ball in balls]
    print(info)
    print(numbers)

CodePudding user response：

If you download that file and inspect it locally, you can see that there is no <div> with that class. That means that it is likely generated dynamically using javascript by your browser, so you would need to use something like selenium to get the full, generated HTML content.

Anyway, in this specific case, this piece of HTML seems to be the container for the data you are looking for:

<div data-url="/api/v1/numbers/powerball/recent?_format=json" 
           data-numbers-powerball="Power Play" data-numbers="All Star Bonus">

Now, if you check that custom data-url, you can find the information you want in JSON format.