I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I 'inspect' the web page, I am able to clearly see the <table></table>
element, but when I run the following code I am returned None results.
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
How can I successfully find this table data?
CodePudding user response:
The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = "https://app-api.dune.com/v1/graphql"
payload = {
"operationName": "GetExecution",
"query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {\n get_execution(\n execution_id: $execution_id\n query_id: $query_id\n parameters: $parameters\n ) {\n execution_queued {\n execution_id\n execution_user_id\n position\n execution_type\n created_at\n __typename\n }\n execution_running {\n execution_id\n execution_user_id\n execution_type\n started_at\n created_at\n __typename\n }\n execution_succeeded {\n execution_id\n runtime_seconds\n generated_at\n columns\n data\n __typename\n }\n execution_failed {\n execution_id\n type\n message\n metadata {\n line\n column\n hint\n __typename\n }\n runtime_seconds\n generated_at\n __typename\n }\n __typename\n }\n}\n",
"variables": {
"execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
"parameters": [],
"query_id": 1144723,
},
}
data = requests.post(api_url, json=payload).json()
df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
func=lambda x: (
(s := BeautifulSoup(x["account"], "html.parser")).text,
s.a["href"],
),
result_type="expand",
axis=1,
)
print(df.head(10)) # <-- print sample data
Prints:
account last_traded rankings total_pnl traded_since link
0 0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46 2022-02-01T13:57:01Z