Home > OS >  How to submit query to extract a table in a .aspx page with python. 2022
How to submit query to extract a table in a .aspx page with python. 2022

Time:07-07

I want to scrape data from https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts. I tried different approaches, like this, this, and this.

I could scrap static pages, but still don't understand the aspx format very well. I am copying here what I took from the first reference link:

import urllib
from bs4 import BeautifulSoup

headers = {
    'Accept':'text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': 'http://www.indiapost.gov.in',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': 'http://www.nitt.edu/prm/nitreg/ShowRes.aspx',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}

class MyOpener(urllib.request.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'https://www.nasdaqtrader.com/Trader.aspx?id=TradeHalts'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

formData = (
     ('__EVENTVALIDATION', eventvalidation),
     ('__VIEWSTATE', viewstate),
     ('__VIEWSTATEENCRYPTED', ''),
)

encodedFields = urllib.parse.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

# We use BeautifulSoup
soup = BeautifulSoup(f)

print(soup.content)

I cannot find the table information in the content. What am I missing?

Thanks for your help

CodePudding user response:

To get the data as pandas DataFrame you can use next example:

import requests
import pandas as pd
from io import StringIO


url = "https://www.nasdaqtrader.com/RPCHandler.axd"

headers = {
    "Referer": "https://www.nasdaqtrader.com/trader.aspx?id=TradeHalts",
}

payload = {
    "id": 2,
    "method": "BL_TradeHalt.GetTradeHalts",
    "params": "[]",
    "version": "1.1",
}

data = requests.post(url, json=payload, headers=headers).json()
data = StringIO(data["result"])

df = pd.read_html(data)[0]
print(df.head(10).to_markdown(index=False))

Prints:

Halt Date Halt Time Issue Symbol Issue Name Market Reason Codes Pause Threshold Price Resumption Date Resumption Quote Time Resumption Trade Time
07/06/2022 15:57:38 COMSP 9.25% Srs A Cmltv Redm Prf Stk NASDAQ LUDP nan 07/06/2022 15:57:38 nan
07/06/2022 12:51:35 BRPMU B. Riley Principal 150 Merg Ut NASDAQ LUDP nan 07/06/2022 12:51:35 12:56:35
07/06/2022 12:06:06 VACC Vaccitech plc ADS NASDAQ LUDP nan 07/06/2022 12:06:06 12:16:06
07/06/2022 11:15:10 USEA United Maritime Corp Cm St NASDAQ LUDP nan 07/06/2022 11:15:10 11:29:25
07/06/2022 10:28:53 USEA United Maritime Corp Cm St NASDAQ LUDP nan 07/06/2022 10:28:53 10:43:30
07/06/2022 10:18:19 USEA United Maritime Corp Cm St NASDAQ LUDP nan 07/06/2022 10:18:19 10:28:19
07/06/2022 09:41:43 GAMB Gambling.com Group Os NASDAQ LUDP nan 07/06/2022 09:41:43 09:46:43
07/06/2022 09:37:16 USEA United Maritime Corp Cm St NASDAQ LUDP nan 07/06/2022 09:37:16 10:17:41
07/06/2022 09:31:15 JJN iPathA Series B Bloomberg Nickel Subindex Total Return ETN NYSE Arca M nan 07/06/2022 09:36:15 09:36:15
07/06/2022 09:31:17 AMTI Applied Molecular Transport Cm NASDAQ LUDP nan 07/06/2022 09:31:17 09:36:17
  • Related