Home > Enterprise >  Web Scraping Table from 'Dune.com' with Python3 and bs4
Web Scraping Table from 'Dune.com' with Python3 and bs4

Time:12-27

I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I 'inspect' the web page, I am able to clearly see the <table></table> element, but when I run the following code I am returned None results.

import bs4
import requests

data = []

r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")

table = soup.find('table')

How can I successfully find this table data?

CodePudding user response:

The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:

import requests
import pandas as pd
from bs4 import BeautifulSoup


api_url = "https://app-api.dune.com/v1/graphql"

payload = {
    "operationName": "GetExecution",
    "query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {\n  get_execution(\n    execution_id: $execution_id\n    query_id: $query_id\n    parameters: $parameters\n  ) {\n    execution_queued {\n      execution_id\n      execution_user_id\n      position\n      execution_type\n      created_at\n      __typename\n    }\n    execution_running {\n      execution_id\n      execution_user_id\n      execution_type\n      started_at\n      created_at\n      __typename\n    }\n    execution_succeeded {\n      execution_id\n      runtime_seconds\n      generated_at\n      columns\n      data\n      __typename\n    }\n    execution_failed {\n      execution_id\n      type\n      message\n      metadata {\n        line\n        column\n        hint\n        __typename\n      }\n      runtime_seconds\n      generated_at\n      __typename\n    }\n    __typename\n  }\n}\n",
    "variables": {
        "execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
        "parameters": [],
        "query_id": 1144723,
    },
}


data = requests.post(api_url, json=payload).json()

df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
    func=lambda x: (
        (s := BeautifulSoup(x["account"], "html.parser")).text,
        s.a["href"],
    ),
    result_type="expand",
    axis=1,
)
print(df.head(10))  # <-- print sample data

Prints:

                                      account           last_traded rankings           total_pnl          traded_since                                                                               link
0  0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46  2022-02-01T13:57:01Z                
  • Related