Home > Net >  None with Beautiful Soup
None with Beautiful Soup

Time:09-22

As NFTs are becoming more popular, tracking their rarities is also part of it. I am trying to retrieve the trait count from this web page and when I inspect the page for the "Trait Count", which shows 5, I see the following:

<div class="flex-grow overflow-hidden">5</div>

So I use the following code:

import requests #fetches html page content
from bs4 import BeautifulSoup #parses html page content

#Ensures we get English translated titles
headers = {"Accept-Language": "en-US, en;q=0.5"}

#Get the contents of the page we're looking at by requesting the URL, timeout set to 5 mins
results = requests.get('https://rarity.tools/boredapeyachtclub/view/1', headers=headers, timeout = 720)

#parse html content
soup = BeautifulSoup(results.text, "html.parser")


#Grab the container that holds the company info
apes_div = soup.find('div', class_='flex-grow overflow-hidden')
print(apes_div)

And I get None instead of 5...

Edit

I have tried selenium with the following code and seem to get the same result:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox(executable_path="/usr/local/bin/geckodriver")

driver.get('https://rarity.tools/boredapeyachtclub/view/1')

trait = driver.find_elements_by_xpath('//td[@class="flex-grow overflow-hidden"]')

print(trait)

CodePudding user response:

You'll notice that, if you put a breakpoint on the driver.find_elements_by_xpath line, that it executes well before the page has completed loading.

This does what you were trying to do:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox(executable_path="bin/geckodriver.exe")

try:
    driver.get('https://rarity.tools/boredapeyachtclub/view/1')
    WebDriverWait(driver, 30).until(
        EC.presence_of_element_located((By.CLASS_NAME, "p-0"))
    )

    traits = driver.find_elements_by_xpath('//div[@class="flex-grow overflow-hidden"]')

    print(traits[2].text)
finally:
    driver.quit()

I just noticed an element with class p-0 getting loaded as part of the needed payload, you might be able to find something cleaner. The WebDriverWait part causes the browser to wait until some condition is met, in this case an element appearing with the given class.

Also, you're looking for a td, but the data you expect is actually in a div - if you try to find a div, you find 8 of them, and the third one happens to be the one you're after.

The try .. finally is just there to clean up the browser once you're done.

(Note: I tried on Windows, so be sure to substitute your own code back to point at the correct driver)

  • Related