Home > Software engineering >  how to parse data from a specific website
how to parse data from a specific website

Time:12-31

I'm trying to parse data from this page: https://rules.art/card/jmks-season-1-common

You can find hereunder the basic code I'm using to try to retrieve some data (e.g. the card's name "JMK$"):

import requests
from bs4 import BeautifulSoup

url = "https://rules.art/card/jmks-season-1-common"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
b = soup.body
c = b.div.findChildren(recursive=True)
print(c)

When executing the above code, I get an empty list [].
It seems I cannot go down the nested div tree: why?

I tried a bunch of things with BeautifulSoup but couldn't get anything better

CodePudding user response:

The HTML of this site is generated in JS so bs4 can't really help you here.

I recommend using selenium for this. Below is an example on how to get the card name:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from time import sleep

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get("https://rules.art/card/jmks-season-1-common")
sleep(3)

card_name = driver.find_element(By.XPATH, "//* 
[@id='__next']/main/div[2]/div[2]/div[1]/div[1]").text

print(card_name)

CodePudding user response:

You can use their GraphQL api to load the data:

import requests

api_url = "https://api.rules.art/graphql"

payload = {
    "extensions": {
        "persistedQuery": {
            "sha256Hash": "25a67acdd1bc76aa6d497a8d08579e7b88b1f3aac3479d1e1622437f5510315b",
            "version": 1,
        }
    },
    "variables": {"slug": "jmks-season-1-common"},
}

while True:
    data = requests.post(api_url, json=payload).json()
    if "data" in data:
        break

print(data)

Prints:

{
    "data": {
        "cardModel": {
            "id": "62bd73ca2ecd6ab6cf1e655c",
            "pictureUrl": "https://assets.rules.art/eyJidWNrZXQiOiJydWxlc2xhYnMtaW1hZ2VzIiwia2V5IjoiY2FyZC1tb2RlbHMvam1rcy1zZWFzb24tMS1jb21tb24uanBnIiwiZWRpdHMiOnsicmVzaXplIjp7IndpZHRoIjoxMDI0LCJmaXQiOiJjb250YWluIn19fQ==",
            "videoUrl": "https://videos.rules.art/mp4/jmks-season-1-common.mp4",
            "lowestAsk": "0x0000000003f18a03b36000",
            "averageSale": "4169443589928020",
            "youtubePreviewId": "9vx3Fj0Sqms",
            "season": 1,
            "scarcity": {"name": "Common", "maxSupply": 3490, "__typename": "Scarcity"},
            "cardsOnSaleCount": 54,
            "artist": {"displayName": "JMK$", "user": None, "__typename": "Artist"},
            "__typename": "CardModel",
        }
    }
}

To print the card name:

print(data["data"]["cardModel"]["artist"]["displayName"])

Prints:

JMK$
  • Related