I am trying to scrape api data from this link (https://api.prizepicks.com/projections). However, I seem to be running into a 403 error. Is there a way around it?
Here is my code:
'''
import pandas as pd
import requests
from pandas.io.json import json_normalize
params = (
('league_id', '7'),
('per_page', '250'),
('projection_type_id', '1'),
('single_stat', 'true'),
)
session = requests.Session()
response = session.get('https://api.prizepicks.com/projections', data=params)
print(response.status_code)
#df1 = json_normalize(response.json()['included'])
#df1 = df1[df1['type'] == 'new_player']
#df2 = json_normalize(response.json()['data'])
#df = pd.DataFrame(zip(df1['attributes.name'], df2['attributes.line_score']), columns=['name', 'points'])
'''
CodePudding user response:
This is one way to access that data:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://api.prizepicks.com/projections'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data'])
print(df)
Result printed in terminal:
type id attributes.board_time attributes.custom_image attributes.description attributes.end_time attributes.flash_sale_line_score attributes.is_promo attributes.line_score attributes.projection_type attributes.rank attributes.refundable attributes.start_time attributes.stat_type attributes.status attributes.tv_channel attributes.updated_at relationships.duration.data relationships.league.data.type relationships.league.data.id relationships.new_player.data.type relationships.new_player.data.id relationships.projection_type.data.type relationships.projection_type.data.id relationships.stat_type.data.type relationships.stat_type.data.id relationships.duration.data.type relationships.duration.data.id
0 projection 812524 2022-10-21T00:00:00-04:00 None RD 4 - PAR 71 None None False 4.0 Single Stat 1 True 2022-10-23T05:50:00-04:00 Birdies Or Better pre_game None 2022-10-22T20:04:02-04:00 NaN league 131 new_player 87385 projection_type 2 stat_type 32 NaN NaN
1 projection 812317 2022-10-22T22:11:00-04:00 None LAL None None False 6.0 Single Stat 1 True 2022-10-23T15:40:00-04:00 Assists pre_game None 2022-10-22T22:19:25-04:00 NaN league 7 new_player 1738 projection_type 2 stat_type 20 NaN NaN
2 projection 812975 2020-04-23T12:30:00-04:00 None NRG (Maps 1-4) None None False 2.0 Single Stat 1 True 2022-10-23T13:00:00-04:00 Goals pre_game https://www.twitch.tv/rocketleague 2022-10-23T00:37:14-04:00 NaN league 161 new_player 37461 projection_type 2 stat_type 29 NaN NaN
3 projection 802798 2021-09-01T10:00:00-04:00 None United States GP Full None None False 2.7 Single Stat 1 True 2022-10-23T15:00:00-04:00 1st Pit Stop Time (sec) pre_game None 2022-10-23T01:14:10-04:00 NaN league 125 new_player 16369 projection_type 2 stat_type 188 duration 11
4 projection 812467 2022-10-21T00:00:00-04:00 None RD 4 - 14 Fairways None None False 11.5 Single Stat 1 True 2022-10-23T11:23:00-04:00 Fairways Hit pre_game None 2022-10-22T20:22:05-04:00 NaN league 1 new_player 10919 projection_type 2 stat_type 96 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1534 projection 813017 2021-03-05T19:00:00-05:00 None Napoli None None False 3.0 Single Stat 1163 True 2022-10-23T14:45:00-04:00 Shots pre_game None 2022-10-23T01:20:53-04:00 NaN league 82 new_player 47513 projection_type 2 stat_type 50 NaN NaN
1535 projection 813018 2021-03-05T19:00:00-05:00 None Roma None None False 2.0 Single Stat 1164 True 2022-10-23T14:45:00-04:00 Shots On Target pre_game None 2022-10-23T01:20:53-04:00 NaN league 82 new_player 47910 projection_type 2 stat_type 104 NaN NaN
1536 projection 813019 2021-03-05T19:00:00-05:00 None Napoli None None False 2.0 Single Stat 1165 True 2022-10-23T14:45:00-04:00 Shots pre_game None 2022-10-23T01:20:53-04:00 NaN league 82 new_player 47512 projection_type 2 stat_type 50 NaN NaN
1537 projection 812997 2021-03-05T19:00:00-05:00 None Atalanta None None False 1.5 Single Stat 2013 True 2022-10-23T12:00:00-04:00 Shots pre_game None 2022-10-23T00:45:53-04:00 NaN league 82 new_player 47710 projection_type 2 stat_type 50 NaN NaN
1538 projection 812998 2021-03-05T19:00:00-05:00 None Lazio None None False 1.5 Single Stat 2013 True 2022-10-23T12:00:00-04:00 Shots pre_game None 2022-10-23T00:45:53-04:00 NaN league 82 new_player 60433 projection_type 2 stat_type 50 NaN NaN
1539 rows × 28 columns
CodePudding user response:
Your code is fine the way it is, the problem is the site has a security feature that checks the User-Agent of incoming requests. All you need to do is add a User-Agent header that mimics a browser. Then you can uncomment the rest of your code and it will work as expected.
import pandas as pd
import requests
from pandas.io.json import json_normalize
params = (
('league_id', '7'),
('per_page', '250'),
('projection_type_id', '1'),
('single_stat', 'true'),
)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
}
session = requests.Session()
response = session.get('https://api.prizepicks.com/projections', data=params, headers=headers)
print(response.status_code)
df1 = json_normalize(response.json()['included'])
df1 = df1[df1['type'] == 'new_player']
df2 = json_normalize(response.json()['data'])
df = pd.DataFrame(zip(df1['attributes.name'], df2['attributes.line_score']), columns=['name', 'points'])
print(df)
ouptut:
name points
0 Jared Goff 0.0
1 Pierre-Emile Højbjerg 13.5
2 Mecole Hardman 11.5
3 Merih Demiral 12.5
4 Ashley Young 2.7
.. ... ...
682 Nick Chubb 6.5
683 Derek Carr 0.5
684 Darnell Mooney 1.5
685 Daniel Suarez 2.0
686 Alexander Schwolow 2.5
[687 rows x 2 columns]