Home > Software engineering >  Need help scraping data from PrizePick api
Need help scraping data from PrizePick api

Time:10-23

I am trying to scrape api data from this link (https://api.prizepicks.com/projections). However, I seem to be running into a 403 error. Is there a way around it?

Here is my code:

'''

import pandas as pd
import requests
from pandas.io.json import json_normalize

params = (
    ('league_id', '7'),
    ('per_page', '250'),
    ('projection_type_id', '1'),
    ('single_stat', 'true'),
)

session = requests.Session() 
response = session.get('https://api.prizepicks.com/projections', data=params)
print(response.status_code)

#df1 = json_normalize(response.json()['included'])
#df1 = df1[df1['type'] == 'new_player']

#df2 = json_normalize(response.json()['data'])

#df = pd.DataFrame(zip(df1['attributes.name'], df2['attributes.line_score']), columns=['name', 'points'])

'''

CodePudding user response:

This is one way to access that data:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

url = 'https://api.prizepicks.com/projections'

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data'])
print(df)

Result printed in terminal:

    type    id  attributes.board_time   attributes.custom_image attributes.description  attributes.end_time attributes.flash_sale_line_score    attributes.is_promo attributes.line_score   attributes.projection_type  attributes.rank attributes.refundable   attributes.start_time   attributes.stat_type    attributes.status   attributes.tv_channel   attributes.updated_at   relationships.duration.data relationships.league.data.type  relationships.league.data.id    relationships.new_player.data.type  relationships.new_player.data.id    relationships.projection_type.data.type relationships.projection_type.data.id   relationships.stat_type.data.type   relationships.stat_type.data.id relationships.duration.data.type    relationships.duration.data.id
0   projection  812524  2022-10-21T00:00:00-04:00   None    RD 4 - PAR 71   None    None    False   4.0 Single Stat 1   True    2022-10-23T05:50:00-04:00   Birdies Or Better   pre_game    None    2022-10-22T20:04:02-04:00   NaN league  131 new_player  87385   projection_type 2   stat_type   32  NaN NaN
1   projection  812317  2022-10-22T22:11:00-04:00   None    LAL None    None    False   6.0 Single Stat 1   True    2022-10-23T15:40:00-04:00   Assists pre_game    None    2022-10-22T22:19:25-04:00   NaN league  7   new_player  1738    projection_type 2   stat_type   20  NaN NaN
2   projection  812975  2020-04-23T12:30:00-04:00   None    NRG (Maps 1-4)  None    None    False   2.0 Single Stat 1   True    2022-10-23T13:00:00-04:00   Goals   pre_game    https://www.twitch.tv/rocketleague  2022-10-23T00:37:14-04:00   NaN league  161 new_player  37461   projection_type 2   stat_type   29  NaN NaN
3   projection  802798  2021-09-01T10:00:00-04:00   None    United States GP Full   None    None    False   2.7 Single Stat 1   True    2022-10-23T15:00:00-04:00   1st Pit Stop Time (sec) pre_game    None    2022-10-23T01:14:10-04:00   NaN league  125 new_player  16369   projection_type 2   stat_type   188 duration    11
4   projection  812467  2022-10-21T00:00:00-04:00   None    RD 4 - 14 Fairways  None    None    False   11.5    Single Stat 1   True    2022-10-23T11:23:00-04:00   Fairways Hit    pre_game    None    2022-10-22T20:22:05-04:00   NaN league  1   new_player  10919   projection_type 2   stat_type   96  NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1534    projection  813017  2021-03-05T19:00:00-05:00   None    Napoli  None    None    False   3.0 Single Stat 1163    True    2022-10-23T14:45:00-04:00   Shots   pre_game    None    2022-10-23T01:20:53-04:00   NaN league  82  new_player  47513   projection_type 2   stat_type   50  NaN NaN
1535    projection  813018  2021-03-05T19:00:00-05:00   None    Roma    None    None    False   2.0 Single Stat 1164    True    2022-10-23T14:45:00-04:00   Shots On Target pre_game    None    2022-10-23T01:20:53-04:00   NaN league  82  new_player  47910   projection_type 2   stat_type   104 NaN NaN
1536    projection  813019  2021-03-05T19:00:00-05:00   None    Napoli  None    None    False   2.0 Single Stat 1165    True    2022-10-23T14:45:00-04:00   Shots   pre_game    None    2022-10-23T01:20:53-04:00   NaN league  82  new_player  47512   projection_type 2   stat_type   50  NaN NaN
1537    projection  812997  2021-03-05T19:00:00-05:00   None    Atalanta    None    None    False   1.5 Single Stat 2013    True    2022-10-23T12:00:00-04:00   Shots   pre_game    None    2022-10-23T00:45:53-04:00   NaN league  82  new_player  47710   projection_type 2   stat_type   50  NaN NaN
1538    projection  812998  2021-03-05T19:00:00-05:00   None    Lazio   None    None    False   1.5 Single Stat 2013    True    2022-10-23T12:00:00-04:00   Shots   pre_game    None    2022-10-23T00:45:53-04:00   NaN league  82  new_player  60433   projection_type 2   stat_type   50  NaN NaN
1539 rows × 28 columns

CodePudding user response:

Your code is fine the way it is, the problem is the site has a security feature that checks the User-Agent of incoming requests. All you need to do is add a User-Agent header that mimics a browser. Then you can uncomment the rest of your code and it will work as expected.

import pandas as pd
import requests
from pandas.io.json import json_normalize

params = (
    ('league_id', '7'),
    ('per_page', '250'),
    ('projection_type_id', '1'),
    ('single_stat', 'true'),
)

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"

}
session = requests.Session()
response = session.get('https://api.prizepicks.com/projections', data=params, headers=headers)
print(response.status_code)

df1 = json_normalize(response.json()['included'])
df1 = df1[df1['type'] == 'new_player']

df2 = json_normalize(response.json()['data'])

df = pd.DataFrame(zip(df1['attributes.name'], df2['attributes.line_score']), columns=['name', 'points'])
print(df)

ouptut:

                      name  points
0               Jared Goff     0.0
1    Pierre-Emile Højbjerg    13.5
2           Mecole Hardman    11.5
3            Merih Demiral    12.5
4             Ashley Young     2.7
..                     ...     ...
682             Nick Chubb     6.5
683             Derek Carr     0.5
684         Darnell Mooney     1.5
685          Daniel Suarez     2.0
686     Alexander Schwolow     2.5

[687 rows x 2 columns]

  • Related