I am currently looking at the following Link:
There is a table that displays all Positions of the ETF. My goal is to extract the table and save it to a xlsx file. I wrote a code:
import requests
import pandas as pd
url = 'https://www.ishares.com/de/privatanleger/de/produkte/251931/ishares-stoxx-europe-600-ucits-etf-de-fund'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_excel('my data.xlsx')
However, when I try pd.read_html(html) it show me always that no tables could have been found on the website. Does somebody know how to identify and pull the desired table via Python?
CodePudding user response:
The problem is that the website uses cookies, so, when you use the default link, it redirects you to a first page that needs you to click in a button to accept cookies. And then you go to the right page. I encountered the right URL, that goes straight to the main page that you are looking for, try this:
url = 'https://www.ishares.com/de/privatanleger/de/produkte/251931/ishares-stoxx-europe-600-ucits-etf-de-fund?switchLocale=y&siteEntryPassthrough=true'
html = requests.get(url).content
df_list = pd.read_html(html)
print(df_list)
Here is my Output:
[Empty DataFrame
Columns: [Ex-Tag, Fälligkeitsdatum, Gesamtausschüttung]
Index: [], Empty DataFrame
Columns: [Ex-Tag, Fälligkeitsdatum, Gesamtausschüttung]
Index: [], Unnamed: 0 2012 2013 2014 2015 ... 2017 2018 2019 2020 2021
0 Gesamtrendite (%) 177 210 74 108 ... 108 -110 276 -19 251
1 Vergleichsindex (%) 178 212 72 96 ... 106 -108 268 -20 249
[2 rows x 11 columns], Unnamed: 0 ... Von 31.Mär.2021Bis 31.Mär.2022
0 Gesamtrendite (%) Per 31.Mär.2022 ... 863
1 Vergleichsindex (%) Per 31.Mär.2022 ... 849
[2 rows x 6 columns], Unnamed: 0 1J 3J 5J 10J Seit Auflage
0 Gesamtrendite (%) 197 894 538 940 651
1 Vergleichsindex (%) 178 879 522 920 636, Unnamed: 0 Seit 1.1. 1M 3M ... 3J 5J 10J Seit Auflage
0 Gesamtrendite (%) -736 -80 -43 ... 2929 2994 14548 21694
1 Vergleichsindex (%) -755 -92 -65 ... 2874 2896 14119 20904
[2 rows x 10 columns], Empty DataFrame
Columns: [Emittententicker, Name, Sektor, Anlageklasse, Marktwert, Gewichtung (%), Nominalwert, Nominale, ISIN, Kurs, Standort, Börse, Marktwährung]
Index: [], Empty DataFrame
Columns: [Kategorie, Fonds]
Index: [], Empty DataFrame
Columns: [Kategorie, Fonds]
Index: [], Börse Ticker ... Common Code (EOC) iNAV ISIN
0 Xetra EXSA ... 186 794 77 -
1 Bolsa Mexicana De Valores EXSA ... - -
2 Borsa Italiana EXSA ... - -
3 SIX Swiss Exchange EXSA ... - -
[4 rows x 14 columns]]
Process finished with exit code 0
CodePudding user response:
This is how you would use Selenium:
from Selenium import webdriver
import time
from Selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
#maximize the window size
driver.maximize_window()
#navigate to the url
driver.get("https://www.google.com/")