Webscrape a table with BeautifulSoup-CodePudding

I'm trying to get the tables (and then the tr and td contents) with requests and BeautifulSoup from this link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ , but I get no results.

I tried with:

import requests
from bs4 import BeautifulSoup

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

tables = soup.find_all('table')

However the result of tables is [].

CodePudding user response：

It looks like the tables are placed in the comments, so you have to adjust the response text:

page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')

Example

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

tables = soup.find_all('table')

Just in addition as mentioned also by @chitown88 there is an option with beautifulsoup method of Comment, to find all comments in HTML. Be aware you have to transform the strings into bs4 again:

soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))

Example

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')