Code below does not fail but it is not complete. From this point I am trying to only get all the fullgame values into a dataframe.
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
for result in (results):
print(json.loads(result['data-op-info']).items())
I used print at the end as I was trying to extract line value only and see it.
Note there is a similar question on this site but the solution only works for one div. It will fail if variable has multiple divs.
How to parse information between {} on web page using Beautifulsoup
CodePudding user response:
You were almost there. See where I have the list comprehension to captures the results then use json_normalize()
import json
from bs4 import BeautifulSoup
import urllib.request
source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')
results = soup.find_all(class_ = "op-item op-spread op-opening")
rlist = [json.loads(result['data-op-info']) for result in (results)]
pd.json_normalize(rlist)
fullgame firsthalf secondhalf firstquarter secondquarter thirdquarter fourthquarter
0 -4.5 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5
1 4.5 2.5 1.5 0.5 0.5 0.5 0.5
2 7 4 3.5 3 3 2.5 2
3 -7 -4 -3.5 -3 -3 -2.5 -2
4 -3 -3 -2.5 -0.5 -2 -0.5 -0.5
5 3 3 2.5 0.5 2 0.5 0.5
6 3 2.5 0.5 0.5 0.5 0.5 0.5
7 -3 -2.5 -0.5 -0.5 -0.5 -0.5 -0.5
8 -3 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5
9 3 0.5 0.5 0.5 0.5 0.5 0.5
10 -3 -2.5 -1 -0.5 -1 -0.5 -0.5
11 3 2.5 1 0.5 1 0.5 0.5
12 -1 0.5 -0.5 0.5 -0.5 -0.5 -0.5
13 1 -0.5 0.5 -0.5 0.5 0.5 0.5
14 2.5 3.5 3 0.5 2.5 0.5 1
15 -2.5 -3.5 -3 -0.5 -2.5 -0.5 -1
16 4 3 2 0.5 1 0.5 0.5
17 -4 -3 -2 -0.5 -1 -0.5 -0.5
18 -2.5 -0.5 -0.5 0.5 -0.5 -0.5 -0.5
19 2.5 0.5 0.5 -0.5 0.5 0.5 0.5
20 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
21 2.5 1.5 0.5 0.5 0.5 0.5 0.5
22 2.5 1.5 0.5 0.5 0.5 0.5 0.5
23 -2.5 -1.5 -0.5 -0.5 -0.5 -0.5 -0.5
24 1.5 1.5 Ev 0.5 -0.5 -0.5 -0.5
25 -1.5 -1.5 Ev -0.5 0.5 0.5 0.5
26 5.5 3 2.5 0.5 0.5 0.5 0.5
27 -5.5 -3 -2.5 -0.5 -0.5 -0.5 -0.5
28 -3.5 -0.5 Ev -0.5 0.5 0.5 0.5
29 3.5 0.5 Ev 0.5 -0.5 -0.5 -0.5
30 -5
31 5
Or, if you really just want one key from the dictionary:
rlist = [json.loads(result['data-op-info'])['fullgame'] for result in (results)]
pd.DataFrame({'fullgame': rlist})
fullgame
0 -4.5
1 4.5
2 7
3 -7
4 -3
5 3
6 3
7 -3
8 -3
9 3
10 -3
11 3
12 -1
13 1
14 2.5
15 -2.5
16 4
17 -4
18 -2.5
19 2.5
20 -2.5
21 2.5
22 2.5
23 -2.5
24 1.5
25 -1.5
26 5.5
27 -5.5
28 -3.5
29 3.5
30 -5
31 5