Can't access table and table elements using bs4-CodePudding

So I am trying to scrape the following webpage: https://www.omscentral.com/

The main table there is my item of interest. I want to scrape the table, and all of its content. When I inspect the content of the page, the table is on a table tag, so I figured it would be easy to access it, with the code below.

url = 'https://www.omscentral.com/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
soup.find_all('table')

However, that code only returns the table header. I saw a similar example here, but the solution of switching the parser did not work.

When I look at the soup object in itself, it seems that the requests does not expand the table, and only captures the header. Not too sure what to do here - any advice would be much appreciated!

CodePudding user response：

Content is stored in script tag and rendered dynamically, so you have to extract the data from there.

data = json.loads(soup.select_one('#__NEXT_DATA__').text)['props']['pageProps']['courses']

To display in DataFrame simply use:

pd.DataFrame(data)

Example

import requests, json
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0'}
url = 'https://www.omscentral.com/'

soup = BeautifulSoup(requests.get(url, headers=headers).text)
data = json.loads(soup.select_one('#__NEXT_DATA__').text)['props']['pageProps']['courses']

for item in data:
    print(item['name'], item.get('officialURL'))

Output

Introduction to Information Security https://omscs.gatech.edu/cs-6035-introduction-to-information-security
Computing for Good https://omscs.gatech.edu/cs-6150-computing-good
Introduction to Operating Systems https://omscs.gatech.edu/cs-6200-introduction-operating-systems
Advanced Operating Systems https://omscs.gatech.edu/cs-6210-advanced-operating-systems
Secure Computer Systems https://omscs.gatech.edu/cs-6238-secure-computer-systems
Computer Networks https://omscs.gatech.edu/cs-6250-computer-networks
...