I am trying to scrape some data from FotMob (a football website), but when accessing the HTML with requests and beautiful soup it returns a huge string of text which looks like it is in the form of a json. An extract is shown below:
{"id":9902,"teamId":9902,"nameAndSubstatValue":{"name":"Ipswich Town","substatValue":10},"statValue":"5.2","rank":13,"type":"teams","statFormat":"fraction","substatFormat":"number"},{"id":8283,"teamId":8283,"nameAndSubstatValue":{"name":"Barnsley","substatValue":5},"statValue":"5.2","rank":14,"type":"teams","statFormat":"fraction","substatFormat":"number"}
The code I used to get this is shown here:
url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r=requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
for p in soup.find_all('script',attrs={'id':'__NEXT_DATA__'}):
print(p.text)
Specifically I want to access the stat_value, name and substatValue and put these into a pandas data frame. Does anyone know how to do this?
CodePudding user response:
Use json.loads
to parse the data:
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, "html.parser")
data = json.loads(soup.find("script", attrs={"id": "__NEXT_DATA__"}).text)
d = data["props"]["pageProps"]["initialState"]["leagueSeasonStats"]["statsData"]
df = pd.DataFrame(d)
df = pd.concat([df, df.pop("nameAndSubstatValue").apply(pd.Series)], axis=1)
print(df)
Prints:
id teamId statValue rank type statFormat substatFormat name substatValue
0 8462 8462 8.9 1 teams fraction number Portsmouth 12
1 8451 8451 7.3 2 teams fraction number Charlton Athletic 9
2 9792 9792 7.3 3 teams fraction number Burton Albion 4
3 8671 8671 6.3 4 teams fraction number Accrington Stanley 8
4 9833 9833 6.2 5 teams fraction number Exeter City 9
5 10170 10170 6.1 6 teams fraction number Derby County 3
6 8677 8677 5.9 7 teams fraction number Peterborough United 12
7 8401 8401 5.8 8 teams fraction number Plymouth Argyle 8
8 8559 8559 5.7 9 teams fraction number Bolton Wanderers 5
9 8676 8676 5.3 10 teams fraction number Wycombe Wanderers 8
10 10163 10163 5.3 11 teams fraction number Sheffield Wednesday 7
11 8680 8680 5.3 12 teams fraction number Cheltenham Town 3
12 9902 9902 5.2 13 teams fraction number Ipswich Town 10
13 8283 8283 5.2 14 teams fraction number Barnsley 5
14 8653 8653 5.0 15 teams fraction number Oxford United 3
15 9799 9799 4.3 16 teams fraction number Port Vale 5
16 45723 45723 4.3 17 teams fraction number Fleetwood Town 4
17 9828 9828 4.0 18 teams fraction number Forest Green Rovers 4
18 9896 9896 3.7 19 teams fraction number Shrewsbury Town 2
19 9834 9834 3.5 20 teams fraction number Cambridge United 5
20 10104 10104 3.2 21 teams fraction number Bristol Rovers 7
21 8430 8430 2.9 22 teams fraction number Lincoln City 4
22 8489 8489 2.6 23 teams fraction number Morecambe 2
23 8645 8645 2.2 24 teams fraction number Milton Keynes Dons 3