I am trying to scrape below table data from a website using BeautifulSoup4 and Python link is : 1: https://i.stack.imgur.com/PfPOQ.png
So far my code is
url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
print(cols)
With this code , I am getting result : Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]
I see Issuer, Industry but value of Issuer and Industry not showing up by my result. Any help would be appreciated. TIA
CodePudding user response:
You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas .
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)
Output:
0 Issuer Fürstenberg Capital Erste GmbH
1 Industry Industrial and bank bonds
2 Market Open Market
3 Subsegment NaN
4 Minimum investment amount 1000
5 Listing unit Percent
6 Issue date 04/04/2005
7 Issue volume 61203000
8 Circulating volume 61203000
9 Issue currency EUR
10 Portfolio currency EUR
11 First trading day 27/06/2012
12 Maturity NaN
13 Extraordinary cancellation type Call option
14 Extraordinary cancellation date NaN
15 Subordinated Yes
CodePudding user response:
Another solution, using just requests
. Note, to obtain the result from the server one has to set required headers (the headers can be seen from the Developer tools -> Network tab).
import requests
url = (
"https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)
headers = {
"X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
"X-Security": "d361b3c92e9c50a248e85a12849f8eee",
"Client-Date": "2022-08-25T09:07:36.196Z",
}
data = requests.get(url, headers=headers).json()
print(data)
Prints:
{
"isin": "XS0216072230",
"type": {
"originalValue": "25",
"translations": {
"de": "(Industrie-) und Bankschuldverschreibungen",
"en": "Industrial and bank bonds",
},
},
"market": {
"originalValue": "OPEN",
"translations": {"de": "Freiverkehr", "en": "Open Market"},
},
"subSegment": None,
"cupon": 2.522,
"interestPaymentPeriod": None,
"firstAnnualPayDate": "2006-06-30",
"minimumInvestmentAmount": 1000.0,
"issuer": "Fürstenberg Capital Erste GmbH",
"issueDate": "2005-04-04",
"issueVolume": 61203000.0,
"circulatingVolume": 61203000.0,
"issueCurrency": "EUR",
"firstTradingDay": "2012-06-27",
"maturity": None,
"noticeType": {
"originalValue": "CALL_OPTION",
"translations": {"others": "Call option"},
},
"extraordinaryCancellation": None,
"portfolioCurrency": "EUR",
"subordinated": True,
"flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
"quotationType": {
"originalValue": "2",
"translations": {"de": "Prozentnotiert", "en": "Percent"},
},
}