The website is "https://www.nseindia.com/companies-listing/corporate-filings-announcements". A friend sent me the underlying link to downloads data between some dates as csv file as "https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27" This link works fine in a web browser First If some one can educate how he got this link or rather how I can get this link. second I am unable to read the csv file to a data frame from this link in python. May be some issues with ' or something else. code is

csv_url='https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=15-01-2022&csv=true''
df = pd.read_csv(csv_url)
print(df.head())

CodePudding user response：

use wget.py
DATA_URL = 'http://www.robots.ox.ac.uk/~ankush/data.tar.gz'

DATA_URL = '/home/xxx/book/data.tar.gz'

out_fname = 'abc.tar.gz'

wget.download(DATA_URL, out=out_fname)

CodePudding user response：

Okay so for this issue, first you need to request the NSE website with headers as mentioned in this post and then once you hit the main website, you get some cookies in your session, using which you can hit your desired url. To convert that url data to pandas compatible string, I followed this answer.

Make sure to have the custom user agent in the header else it will fail.

import pandas as pd
import io
import requests

base_url = 'https://www.nseindia.com'
session = requests.Session()
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
                         'like Gecko) '
                         'Chrome/80.0.3987.149 Safari/537.36',
    'accept-language': 'en,gu;q=0.9,hi;q=0.8',
    'accept-encoding': 'gzip, deflate, br'}

r = session.get(url, headers=headers, timeout=5)
cookies = dict(r.cookies)
response = session.get('https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true', timeout=5, headers=headers)

content = response.content
df=pd.read_csv(io.StringIO(content.decode('utf-8')))
print(df.head())