I'm trying to take data of the premier league website : https://www.premierleague.com/clubs/4/club/stats?se=15
My problem is when I'm taking the data from the site mentioned above I get the data from this site: https://www.premierleague.com/clubs/4/club/stats
So the data and URL changes after filtering to a different season but does not appear to change when I'm trying to take it from the site.
My code :
from bs4 import BeautifulSoup
import requests
import numpy as np
ChelseaReq = requests.get("https://www.premierleague.com/clubs/4/club/stats?se=15")
ChelseaData = ChelseaReq.text
soup = BeautifulSoup(ChelseaData, "html.parser")
dataSet = np.array([])
dataSet1 = np.array([])
chelsea_db = {}
for stattext in soup.find_all("div",class_ ="normalStat"):
chelsea_stat_numbers = stattext.span.text.split()[-1]
chelsea_stat_numbers = chelsea_stat_numbers.replace(',','')
chelsea_stat_numbers = chelsea_stat_numbers.replace('%','')
dataSet = np.append(dataSet,float(chelsea_stat_numbers))
chelsea_stat_attributes = ','.join(stattext.span.text.split()[0:-1])
chelsea_stat_attributes = chelsea_stat_attributes.replace(',',' ')
dataSet1 = np.append(dataSet1,chelsea_stat_attributes)
for A,B in zip(dataSet1,dataSet):
chelsea_db[A] = B
chelsea_db
This prints the total data instead of the filtered data. How would I change it to return the filtered data instead?
e.g :
current output =
'Goals': 1936.0,
'Goals per match': 1.71,
'Shots': 9954.0, ... etc
(after filtering the data on the website's filter button to a single season)
my goal =
'Goals': 36,
'Goals per match': 1.71,
'Shots': 160, ... etc
CodePudding user response:
You don't get filtered data because this data is loaded by Javascript using XHR-request. But you can send this request directly and get all needed data in JSON format. So you don't even need to use BeautifulSoup
. Here is code sample:
import requests
import json
headers = {
'origin': 'https://www.premierleague.com', # your get 403 Forbidden without this header
}
params = {
"comps": 1,
"compSeasons": 15 # number of season
}
chelsea_season_data = requests.get("https://footballapi.pulselive.com/football/stats/team/4",
params=params, headers=headers)
data = json.loads(chelsea_season_data.text)
for stat in data['stats']:
if stat['name'] == 'wins':
print(f"Wins: {stat['value']}")
elif stat['name'] == 'losses':
print(f"Losses: {stat['value']}")
elif stat['name'] == 'goals':
print(f"Goals: {stat['value']}")
elif stat['name'] == 'goals_conceded':
print(f"Goals conceded: {stat['value']}")
elif stat['name'] == 'clean_sheet':
print(f"Clean sheets: {stat['value']}")