Home > OS >  Scrape table data from website using Python
Scrape table data from website using Python

Time:08-26

I am trying to scrape below table data from a website using BeautifulSoup4 and Python link is : 1: https://i.stack.imgur.com/PfPOQ.png

So far my code is

url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
 for row in rows:
        cols = row.find_all('td')
        print(cols)

With this code , I am getting result : Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]

I see Issuer, Industry but value of Issuer and Industry not showing up by my result. Any help would be appreciated. TIA

CodePudding user response:

You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas .

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)

Output:

0                            Issuer  Fürstenberg Capital Erste GmbH
1                          Industry       Industrial and bank bonds
2                            Market                     Open Market
3                        Subsegment                             NaN
4         Minimum investment amount                            1000
5                      Listing unit                         Percent
6                        Issue date                      04/04/2005
7                      Issue volume                        61203000
8                Circulating volume                        61203000
9                    Issue currency                             EUR
10               Portfolio currency                             EUR
11                First trading day                      27/06/2012
12                         Maturity                             NaN
13  Extraordinary cancellation type                     Call option
14  Extraordinary cancellation date                             NaN
15                     Subordinated                             Yes

CodePudding user response:

Another solution, using just requests. Note, to obtain the result from the server one has to set required headers (the headers can be seen from the Developer tools -> Network tab).

import requests

url = (
    "https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)

headers = {
    "X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
    "X-Security": "d361b3c92e9c50a248e85a12849f8eee",
    "Client-Date": "2022-08-25T09:07:36.196Z",
}

data = requests.get(url, headers=headers).json()
print(data)

Prints:

{
    "isin": "XS0216072230",
    "type": {
        "originalValue": "25",
        "translations": {
            "de": "(Industrie-) und Bankschuldverschreibungen",
            "en": "Industrial and bank bonds",
        },
    },
    "market": {
        "originalValue": "OPEN",
        "translations": {"de": "Freiverkehr", "en": "Open Market"},
    },
    "subSegment": None,
    "cupon": 2.522,
    "interestPaymentPeriod": None,
    "firstAnnualPayDate": "2006-06-30",
    "minimumInvestmentAmount": 1000.0,
    "issuer": "Fürstenberg Capital Erste GmbH",
    "issueDate": "2005-04-04",
    "issueVolume": 61203000.0,
    "circulatingVolume": 61203000.0,
    "issueCurrency": "EUR",
    "firstTradingDay": "2012-06-27",
    "maturity": None,
    "noticeType": {
        "originalValue": "CALL_OPTION",
        "translations": {"others": "Call option"},
    },
    "extraordinaryCancellation": None,
    "portfolioCurrency": "EUR",
    "subordinated": True,
    "flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
    "quotationType": {
        "originalValue": "2",
        "translations": {"de": "Prozentnotiert", "en": "Percent"},
    },
}
  • Related