I want to get the prices from this instrument on this webpage: http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500
Normally the requests.get
does the trick, but for this one the script gets stuck. I've tried a user-agent according to this answer How to use Python requests to fake a browser visit a.k.a and generate User Agent?
but no luck. My code
import requests
url = "http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"
}
response = requests.get(url, headers=headers)
CodePudding user response:
It looks like that site (the data on its charts) is loaded dynamically using Javascript, so requests
won't return a useable result. You can use Selenium to simulate an actual browser instance which will run the Javascript needed for grabbing data off the page.
You'll need:
- Selenium installed using
pip install selenium
- A browser driver binary in PATH or in the directory of your Python script. I suggest Mozilla's Geckodriver found here: https://github.com/mozilla/geckodriver/releases
Usage example:
from selenium import webdriver
from selenium.webdriver.common.by import By
options = webdriver.FirefoxOptions()
# options.headless = True # This is normally the first google search after people find Selenium.
driver = webdriver.Firefox(options=options)
# Grabbing a URL using the browser instance.
driver.get("URL")
# Finding an element by ID
example_element = driver.find_element(By.ID, "Element ID")
print(example_element.text)
# Closing the browser instance
driver.quit()
It'll take some messing around to figure out how to utilize all of Selenium's capabilities in your code, but there's a lot of documentation (https://selenium-python.readthedocs.io) out there for figuring it all out.
CodePudding user response:
The User-Agent you're using is very old (at least 8 years old), and may be blocked by very basic protections.
If you switch to a very common User-Agent like 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
it works fine.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
response = requests.get(
'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500',
headers=headers
)
response.status_code
# 200
And if you need to get the real data, you'll need to fetch it from another URL (you can find it with your browser inspector):
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}
response = requests.get(
'http://www.nasdaqomxnordic.com/webproxy/DataFeedProxy.aspx?SubSystem=History&Action=GetChartData&inst.an=id,nm,fnm,isin,tp,chp,ycp&FromDate=2022-05-19&ToDate=2022-08-19&json=true&timezone=CET&showAdjusted=false&app=/etp/etf/etfhistorical-HistoryChart&Instrument=SSE500',
headers=headers
)
response.json()
CodePudding user response:
The webpage is dynamic. So to get the desired data, you can use an automation tool something like selenium. Here I use selenium with bs4.
Example:
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'http://www.nasdaqomxnordic.com/etp/etf/etfhistorical?languageId=3&Instrument=SSE500'
driver.get(url)
driver.maximize_window()
time.sleep(3)
soup=BeautifulSoup(driver.page_source,'lxml')
table = soup.select_one('table#avistaTable')
df = pd.read_html(str(table))[0]
print(df)
Output:
Namn Senast Oms % Högst Lägst Uppdaterad (CET)
0 XACT OMXS30 ESG (UCITS ETF) 27295 216 388 857 -40 27415 27275 17:09:35