Home > Mobile >  Beautiful soup to extract key value pairs from data-op-info
Beautiful soup to extract key value pairs from data-op-info

Time:11-25

Code below does not fail but it is not complete. From this point I am trying to only get all the fullgame values into a dataframe.

import json
from bs4 import BeautifulSoup
import urllib.request

source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')

results = soup.find_all(class_ = "op-item op-spread op-opening")

for result in (results):
    print(json.loads(result['data-op-info']).items())

I used print at the end as I was trying to extract line value only and see it.

Note there is a similar question on this site but the solution only works for one div. It will fail if variable has multiple divs.
How to parse information between {} on web page using Beautifulsoup

CodePudding user response:

You were almost there. See where I have the list comprehension to captures the results then use json_normalize()

import json
from bs4 import BeautifulSoup
import urllib.request

source = urllib.request.urlopen('https://www.oddsshark.com/nfl/odds').read()
soup = BeautifulSoup(source, 'html.parser')

results = soup.find_all(class_ = "op-item op-spread op-opening")

rlist = [json.loads(result['data-op-info']) for result in (results)]
pd.json_normalize(rlist)

   fullgame firsthalf secondhalf firstquarter secondquarter thirdquarter fourthquarter
0      -4.5      -2.5       -1.5         -0.5          -0.5         -0.5          -0.5
1       4.5       2.5        1.5          0.5           0.5          0.5           0.5
2         7         4        3.5            3             3          2.5             2
3        -7        -4       -3.5           -3            -3         -2.5            -2
4        -3        -3       -2.5         -0.5            -2         -0.5          -0.5
5         3         3        2.5          0.5             2          0.5           0.5
6         3       2.5        0.5          0.5           0.5          0.5           0.5
7        -3      -2.5       -0.5         -0.5          -0.5         -0.5          -0.5
8        -3      -0.5       -0.5         -0.5          -0.5         -0.5          -0.5
9         3       0.5        0.5          0.5           0.5          0.5           0.5
10       -3      -2.5         -1         -0.5            -1         -0.5          -0.5
11        3       2.5          1          0.5             1          0.5           0.5
12       -1       0.5       -0.5          0.5          -0.5         -0.5          -0.5
13        1      -0.5        0.5         -0.5           0.5          0.5           0.5
14      2.5       3.5          3          0.5           2.5          0.5             1
15     -2.5      -3.5         -3         -0.5          -2.5         -0.5            -1
16        4         3          2          0.5             1          0.5           0.5
17       -4        -3         -2         -0.5            -1         -0.5          -0.5
18     -2.5      -0.5       -0.5          0.5          -0.5         -0.5          -0.5
19      2.5       0.5        0.5         -0.5           0.5          0.5           0.5
20     -2.5      -1.5       -0.5         -0.5          -0.5         -0.5          -0.5
21      2.5       1.5        0.5          0.5           0.5          0.5           0.5
22      2.5       1.5        0.5          0.5           0.5          0.5           0.5
23     -2.5      -1.5       -0.5         -0.5          -0.5         -0.5          -0.5
24      1.5       1.5         Ev          0.5          -0.5         -0.5          -0.5
25     -1.5      -1.5         Ev         -0.5           0.5          0.5           0.5
26      5.5         3        2.5          0.5           0.5          0.5           0.5
27     -5.5        -3       -2.5         -0.5          -0.5         -0.5          -0.5
28     -3.5      -0.5         Ev         -0.5           0.5          0.5           0.5
29      3.5       0.5         Ev          0.5          -0.5         -0.5          -0.5
30       -5
31        5

Or, if you really just want one key from the dictionary:

rlist = [json.loads(result['data-op-info'])['fullgame'] for result in (results)]
pd.DataFrame({'fullgame': rlist})

   fullgame
0      -4.5
1       4.5
2         7
3        -7
4        -3
5         3
6         3
7        -3
8        -3
9         3
10       -3
11        3
12       -1
13        1
14      2.5
15     -2.5
16        4
17       -4
18     -2.5
19      2.5
20     -2.5
21      2.5
22      2.5
23     -2.5
24      1.5
25     -1.5
26      5.5
27     -5.5
28     -3.5
29      3.5
30       -5
31        5
  • Related