extracting csv download link from an webpage using python-CodePudding

I want to extract the CSV download URL from website - https://www.nseindia.com/option-chain

Code I used till now

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.get("https://www.nseindia.com/option-chain")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, 
"equity_underlyingVal")))
nifty = (driver.find_element(By.XPATH, '//* 
[@id="equity_underlyingVal"]').text).replace('NIFTY ', 
'').replace(',','')
time_stamp = driver.find_element(By.XPATH, '//* 
[@id="equity_timeStamp"]').text

I need the csv link to be load in pandas df. I dont want to use selenium or if using selenium, I need it as headless. Let me know if anyone has a better idea about extracting data directly into pandas datafream..

CodePudding user response：

You can extract the downloading link contained in that element with Selenium as following:

link = driver.find_element(By.CSS_SELECTOR, '#downloadOCTable').get_attribute("href")

CodePudding user response：

As the download link is not present in the href attribute, the best approach is to download the csv file.

Interacting in headless mode can cause problems if the window-size argument is not specified, and a workaround to download files in headless mode is to specify the download path using the driver.command_executor method.

Code snippet to download csv in headless mode-

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os

options = Options()

#add necessary arguments
options.add_argument("user-agent= Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36")
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")

driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')

#set download path (set to current working directory in this example)
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow','downloadPath':os.getcwd()}}
command_result = driver.execute("send_command", params)

driver.get("https://www.nseindia.com/option-chain")

#wait for table details to appear
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="equity_optionChainTable"]')))

#find and click on download csv button
download_button=driver.find_element_by_xpath('//*[@id="downloadOCTable"]')
download_button.click()