Home > OS >  Webscrape a table with BeautifulSoup
Webscrape a table with BeautifulSoup

Time:03-26

I'm trying to get the tables (and then the tr and td contents) with requests and BeautifulSoup from this link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ , but I get no results.

I tried with:

import requests
from bs4 import BeautifulSoup

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

tables = soup.find_all('table') 

However the result of tables is [].

CodePudding user response:

It looks like the tables are placed in the comments, so you have to adjust the response text:

page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

Example

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

tables = soup.find_all('table') 

Just in addition as mentioned also by @chitown88 there is an option with beautifulsoup method of Comment, to find all comments in HTML. Be aware you have to transform the strings into bs4 again:

soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))
Example
import requests
from bs4 import BeautifulSoup
from bs4 import Comment

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')
  • Related