I'm trying to get the tables (and then the tr and td contents) with requests and BeautifulSoup from this link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ , but I get no results.
I tried with:
import requests
from bs4 import BeautifulSoup
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
tables = soup.find_all('table')
However the result of tables is [].
CodePudding user response:
It looks like the tables are placed in the comments, so you have to adjust the response text:
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
Example
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
tables = soup.find_all('table')
Just in addition as mentioned also by @chitown88 there is an option with beautifulsoup
method of Comment
, to find all comments in HTML. Be aware you have to transform the strings into bs4
again:
soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))
Example
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')