Home > Mobile >  Python requests get stuck when trying to get web content
Python requests get stuck when trying to get web content

Time:08-20

I want to get the prices from this instrument on this webpage: http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500

Normally the requests.get does the trick, but for this one the script gets stuck. I've tried a user-agent according to this answer How to use Python requests to fake a browser visit a.k.a and generate User Agent?

but no luck. My code

import requests

url = "http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}

response = requests.get(url, headers=headers)

CodePudding user response:

It looks like that site (the data on its charts) is loaded dynamically using Javascript, so requests won't return a useable result. You can use Selenium to simulate an actual browser instance which will run the Javascript needed for grabbing data off the page.

You'll need:

Usage example:

from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.FirefoxOptions()
# options.headless = True # This is normally the first google search after people find Selenium.
driver = webdriver.Firefox(options=options)

# Grabbing a URL using the browser instance.
driver.get("URL")

# Finding an element by ID
example_element = driver.find_element(By.ID, "Element ID")
print(example_element.text)

# Closing the browser instance
driver.quit()

It'll take some messing around to figure out how to utilize all of Selenium's capabilities in your code, but there's a lot of documentation (https://selenium-python.readthedocs.io) out there for figuring it all out.

CodePudding user response:

The User-Agent you're using is very old (at least 8 years old), and may be blocked by very basic protections.

If you switch to a very common User-Agent like 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36' it works fine.

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}

response = requests.get(
    'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500', 
    headers=headers
)
response.status_code
# 200

And if you need to get the real data, you'll need to fetch it from another URL (you can find it with your browser inspector):

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}

response = requests.get(
    'http://www.nasdaqomxnordic.com/webproxy/DataFeedProxy.aspx?SubSystem=History&Action=GetChartData&inst.an=id,nm,fnm,isin,tp,chp,ycp&FromDate=2022-05-19&ToDate=2022-08-19&json=true&timezone=CET&showAdjusted=false&app=/etp/etf/etfhistorical-HistoryChart&Instrument=SSE500', 
    headers=headers
)
response.json()

CodePudding user response:

The webpage is dynamic. So to get the desired data, you can use an automation tool something like selenium. Here I use selenium with bs4.

Example:

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500'
driver.get(url)
driver.maximize_window()
time.sleep(3)

soup=BeautifulSoup(driver.page_source,'lxml')
table = soup.select_one('table#avistaTable')

df = pd.read_html(str(table))[0]
print(df)

Output:

                          Namn  Senast          Oms   %  Högst  Lägst Uppdaterad  (CET)
0  XACT OMXS30 ESG (UCITS ETF)   27295  216 388 857 -40  27415  27275          17:09:35
  • Related