Python scrape nasdaq omx nordic-CodePudding

I am trying to get the table from http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap.

However, when I try to run the following code snippets then the code never finishes.

# Using request 
import requests
url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'
requests.get(url)


# Using pandas
import pandas as pd
url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'
pd.read_html(url)

Replacing the url with others are working just fine, e.g. https://en.wikipedia.org/wiki/List_of_S&P_500_companies takes around a second.

CodePudding user response：

This is one way to get that table:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

headers = {
    'accept-language': 'en-US,en;q=0.9',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}

url = 'http://www.nasdaqomxnordic.com/shares/listed-companies/nordic-large-cap'

r = requests.get(url, headers=headers)
table = bs(r.text, 'html.parser').select_one('table#listedCompanies')
df = pd.read_html(str(table))[0]
print(df)

Result in terminal:

Name    Symbol  Currency    ISIN    Sector  ICB Code    Fact Sheet
0   TRATON  8TRA    SEK DE000TRAT0N7    Industrials 5020    NaN
1   AAK AAK SEK SE0011337708    Consumer Goods  4510    NaN
2   ABB Ltd ABB SEK CH0012221716    Industrials 5020    NaN
3   Addtech B   ADDT B  SEK SE0014781795    Industrials 5020    NaN
4   AFRY    AFRY    SEK SE0005999836    Industrials 5010    NaN
... ... ... ... ... ... ... ...
249 Wallenstam B    WALL B  SEK SE0017780133    Real Estate 3510    NaN
250 Wihlborgs Fastigheter   WIHL    SEK SE0018012635    Real Estate 3510    NaN
251 Wärtsilä Oyj Abp    WRT1V   EUR FI0009003727    Industrials 5020    NaN
252 YIT Oyj YIT EUR FI0009800643    Industrials 5010    NaN
253 Zealand Pharma  ZEAL    DKK DK0060257814    Health Care 2010    NaN
254 rows × 7 columns