I'm trying to parse data from this page: https://rules.art/card/jmks-season-1-common
You can find hereunder the basic code I'm using to try to retrieve some data (e.g. the card's name "JMK$"):
import requests
from bs4 import BeautifulSoup
url = "https://rules.art/card/jmks-season-1-common"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
b = soup.body
c = b.div.findChildren(recursive=True)
print(c)
When executing the above code, I get an empty list []
.
It seems I cannot go down the nested div
tree: why?
I tried a bunch of things with BeautifulSoup but couldn't get anything better
CodePudding user response:
The HTML of this site is generated in JS so bs4 can't really help you here.
I recommend using selenium for this. Below is an example on how to get the card name:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from time import sleep
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://rules.art/card/jmks-season-1-common")
sleep(3)
card_name = driver.find_element(By.XPATH, "//*
[@id='__next']/main/div[2]/div[2]/div[1]/div[1]").text
print(card_name)
CodePudding user response:
You can use their GraphQL api to load the data:
import requests
api_url = "https://api.rules.art/graphql"
payload = {
"extensions": {
"persistedQuery": {
"sha256Hash": "25a67acdd1bc76aa6d497a8d08579e7b88b1f3aac3479d1e1622437f5510315b",
"version": 1,
}
},
"variables": {"slug": "jmks-season-1-common"},
}
while True:
data = requests.post(api_url, json=payload).json()
if "data" in data:
break
print(data)
Prints:
{
"data": {
"cardModel": {
"id": "62bd73ca2ecd6ab6cf1e655c",
"pictureUrl": "https://assets.rules.art/eyJidWNrZXQiOiJydWxlc2xhYnMtaW1hZ2VzIiwia2V5IjoiY2FyZC1tb2RlbHMvam1rcy1zZWFzb24tMS1jb21tb24uanBnIiwiZWRpdHMiOnsicmVzaXplIjp7IndpZHRoIjoxMDI0LCJmaXQiOiJjb250YWluIn19fQ==",
"videoUrl": "https://videos.rules.art/mp4/jmks-season-1-common.mp4",
"lowestAsk": "0x0000000003f18a03b36000",
"averageSale": "4169443589928020",
"youtubePreviewId": "9vx3Fj0Sqms",
"season": 1,
"scarcity": {"name": "Common", "maxSupply": 3490, "__typename": "Scarcity"},
"cardsOnSaleCount": 54,
"artist": {"displayName": "JMK$", "user": None, "__typename": "Artist"},
"__typename": "CardModel",
}
}
}
To print the card name:
print(data["data"]["cardModel"]["artist"]["displayName"])
Prints:
JMK$