I want to extract tables from "Bonds Traded on the Exchange" and "OTC trades"and save it to excel sheet. I am trying to scrape data with python ( BS & requests ) but I am unable to scrape data ( I dont wanna use selenium). Can any1 guide me ? I am not getting any error , it doesn't get prpcessed in python terminal I think terminal gets hanged , as I don't even get any error message .
import requests
import pandas as pd
import os
from bs4 import BeautifulSoup as bs
url = "https://www1.nseindia.com/products/content/debt/corp_bonds/cbm_reporting_homepage.htm"
#condition True
#while condition:
html = requests.get(url).content
page= requests.get(url)
soup= bs(page.text, 'lxml')
df_list = pd.read_html(html)
df = df_list[0] # can change 0 to other number
print(df)
CodePudding user response:
If you look at Network tab, you will see cbm_reporting_cbricsL.htm
which is what you need to scrape. By the way, you should also add headers for requests to work properly. See detailed explanation in this thread:
import requests
import pandas as pd
from bs4 import BeautifulSoup
res = requests.get(
'https://www1.nseindia.com/products/dynaContent/debt/corp_bonds/htms/cbm_reporting_cbricsL.htm',
headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}
)
soup = BeautifulSoup(res.text, 'lxml')
raw_columns = [row.find_all('td') for row in soup.find_all('tr')]
# first 3 items were dummy
df = pd.DataFrame.from_records(raw_columns[3:])
The result would look like:
0 [INE001A07TA7] [HOUSING DEVELOPMENT FINANCE CORPORATION LTD S... [ 100.0030] [ 4.7082] [ 16] [[ 168000.00]] [ 100.0000] [ 4.7091]
1 [INE134E07AP6] [POWER FINANCE CORPORATION LTD. TRI SRV CATIII... [ 100.8500] [ 6.6934] [ 1] [ 1000.00 ] [ 100.8500] [ 6.6934]
2 [INE020B08963] [RURAL ELECTRIFICATION CORPORATION LIMITED SR-... [ 107.6835] [ 5.9200] [ 1] [ 1500.00 ] [ 107.6835] [ 5.9200]
3 [INE163N08131] [-] [ 104.2195] [ 6.6200] [ 1] [ 780.00 ] [ 104.2195] [ 6.6200]
4 [INE540P07343] [-] [ 104.3408] [ 9.3603] [ 6] [[ 1110.00]] [ 104.2640] [ 9.3800]
... ... ... ... ... ... ... ... ...
93 [INE377Y07250] [BAJAJ HOUSING FINANCE LIMITED SR 27 5.69 NCD ... [ 100.0300] [ 5.6845] [ 1] [ 9000.00 ] [ 100.0300] [ 5.6845]
94 [INE115A07ML7] [LIC HOUSING FINANCE LIMITED SRTR349OP-1 7.4NC... [ 105.0991] [ 5.5000] [ 1] [ 1000.00 ] [ 105.0991] [ 5.5000]
95 [INE020B07HN3] [RURAL ELECTRIFICATION CORPORATION LIMITED SR-... [ 123.6000] [ 4.4400] [ 1] [ 10.00 ] [ 123.6000] [ 4.4400]
96 [INE101A08070] [MAHINDRA AND MAHINDRA LIMITED 9.55 NCD 04JL63... [ 125.5000] [ 7.5218] [ 1] [ 820.00 ] [ 125.5000] [ 7.5218]
97 [INE062A08215] [STATE BANK OF INDIA SERIES I 8.75 BD PERPETUA... [ 104.5304] [ 7.0000] [ 1] [ 10.00 ] [ 104.5304] [ 7.0000]
CodePudding user response:
THIS IS MY FINAL ANSWER
import requests
import pandas as pd
headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}
html = requests.get(
'https://www1.nseindia.com/products/dynaContent/debt/corp_bonds/htms/cbm_reporting_cbricsL.htm',
headers=headers).content
df_list = pd.read_html(html)
df = df_list[0]
print (df)