Home > Blockchain >  extracting csv download link from an webpage using python
extracting csv download link from an webpage using python

Time:12-09

I want to extract the CSV download URL from website - https://www.nseindia.com/option-chain

enter image description here

Code I used till now

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.get("https://www.nseindia.com/option-chain")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, 
"equity_underlyingVal")))
nifty = (driver.find_element(By.XPATH, '//* 
[@id="equity_underlyingVal"]').text).replace('NIFTY ', 
'').replace(',','')
time_stamp = driver.find_element(By.XPATH, '//* 
[@id="equity_timeStamp"]').text

I need the csv link to be load in pandas df. I dont want to use selenium or if using selenium, I need it as headless. Let me know if anyone has a better idea about extracting data directly into pandas datafream..

CodePudding user response:

You can extract the downloading link contained in that element with Selenium as following:

link = driver.find_element(By.CSS_SELECTOR, '#downloadOCTable').get_attribute("href")

CodePudding user response:

As the download link is not present in the href attribute, the best approach is to download the csv file.

Interacting in headless mode can cause problems if the window-size argument is not specified, and a workaround to download files in headless mode is to specify the download path using the driver.command_executor method.

Code snippet to download csv in headless mode-

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os

options = Options()

#add necessary arguments
options.add_argument("user-agent= Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36")
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")

driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')

#set download path (set to current working directory in this example)
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow','downloadPath':os.getcwd()}}
command_result = driver.execute("send_command", params)

driver.get("https://www.nseindia.com/option-chain")

#wait for table details to appear
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="equity_optionChainTable"]')))

#find and click on download csv button
download_button=driver.find_element_by_xpath('//*[@id="downloadOCTable"]')
download_button.click()
  • Related