I am trying to get the table from http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap.
However, when I try to run the following code snippets then the code never finishes.
# Using request
import requests
url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'
requests.get(url)
# Using pandas
import pandas as pd
url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'
pd.read_html(url)
Replacing the url with others are working just fine, e.g. https://en.wikipedia.org/wiki/List_of_S&P_500_companies takes around a second.
CodePudding user response:
This is one way to get that table:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
headers = {
'accept-language': 'en-US,en;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'
r = requests.get(url, headers=headers)
table = bs(r.text, 'html.parser').select_one('table#listedCompanies')
df = pd.read_html(str(table))[0]
print(df)
Result in terminal:
Name Symbol Currency ISIN Sector ICB Code Fact Sheet
0 TRATON 8TRA SEK DE000TRAT0N7 Industrials 5020 NaN
1 AAK AAK SEK SE0011337708 Consumer Goods 4510 NaN
2 ABB Ltd ABB SEK CH0012221716 Industrials 5020 NaN
3 Addtech B ADDT B SEK SE0014781795 Industrials 5020 NaN
4 AFRY AFRY SEK SE0005999836 Industrials 5010 NaN
... ... ... ... ... ... ... ...
249 Wallenstam B WALL B SEK SE0017780133 Real Estate 3510 NaN
250 Wihlborgs Fastigheter WIHL SEK SE0018012635 Real Estate 3510 NaN
251 Wärtsilä Oyj Abp WRT1V EUR FI0009003727 Industrials 5020 NaN
252 YIT Oyj YIT EUR FI0009800643 Industrials 5010 NaN
253 Zealand Pharma ZEAL DKK DK0060257814 Health Care 2010 NaN
254 rows × 7 columns