I am currently trying to streamline my projection model process by scraping the projections from PrizePicks. I keep running into an error that says Invalid Syntax. Any help would be greatly appreciated. Here is my code
import requests
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&per_page=250&single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'
}
response = requests.get(url=pp_props_url, headers=headers).json()
player_prop = response
player_prop
columns_list = [
"League",
"League_Id",
"Market",
"Name",
"Position",
"Team",
"Team_Name"
"Stat_Type"
"Line_Score",
"Points",
"Rebounds",
"Assists",
"Pts Rebs Asts",
"3-Pt Made"
]
pp_df = pd.DataFrame(player_prop, columns = columns_list)
pp_df.to_csv('player_props_20221030.csv', index=False)
When I open the csv the only thing printed are the column headers. BRAND NEW to this web scraping deal, so I am truly grateful for any help.
CodePudding user response:
Your questions was troublesome for me. First of all, I got prompted errors on retrieving the .json file. These were solved after removing 'Accept-Encoding': 'gzip, deflate, br'
from your headers. After this indeed a valid .json file is returned, but perhaps this was already the case for you.
Then after retrieving this .json file, I can see that many of your columns_list
headers are not in the actual .json file. Take for example League, which is defined as league in the .json file. Capitals matter.
Furthermore, the .json file renders all players of interest under the section included
, so you can retrieve them as such: player_prop["included"]
. Then for every player, you are interested in the attributes section so we need to filter them as such: [i["attributes"] for i in player_prop["included"]]
.
Now you can create your pd.DataFrame, and you will see that for every player a row is created. I have only changed for a couple of column headers, and some that you placed seem to not be in the .json file so it would be good to check (3-PT Made seems to be a value of name for example), but it returns the data as expected:
>>> pd.DataFrame([i["attributes"] for i in player_prop["included"]], columns=columns_list)
league league_id market name ... rebounds assists pts rebs asts 3-pt made
0 NBA 7.0 Houston Jalen Green ... NaN NaN NaN NaN
1 NBA 7.0 Phoenix Chris Paul ... NaN NaN NaN NaN
2 NBA 7.0 Phoenix Cameron Johnson ... NaN NaN NaN NaN
3 NaN NaN NaN Blocked Shots ... NaN NaN NaN NaN
4 NaN NaN NaN NBA ... NaN NaN NaN NaN
5 NaN NaN NaN Fantasy Score ... NaN NaN NaN NaN
6 NBA 7.0 Houston Kevin Porter Jr. ... NaN NaN NaN NaN
7 NBA 7.0 Phoenix Mikal Bridges ... NaN NaN NaN NaN
8 NaN NaN NaN Single Stat ... NaN NaN NaN NaN
9 NaN NaN NaN 3-PT Made ... NaN NaN NaN NaN
10 NaN NaN NaN Assists ... NaN NaN NaN NaN
11 NaN NaN NaN Points ... NaN NaN NaN NaN
12 NBA 7.0 Denver Nikola Jokic ... NaN NaN NaN NaN
13 NBA 7.0 Denver Jamal Murray ... NaN NaN NaN NaN
14 NBA 7.0 Los Angeles Lonnie Walker IV ... NaN NaN NaN NaN
15 NBA 7.0 Los Angeles Patrick Beverley ... NaN NaN NaN NaN
16 NaN NaN NaN Steals ... NaN NaN NaN NaN
17 NaN NaN NaN Pts Rebs Asts ... NaN NaN NaN NaN
18 NaN NaN NaN Pts Asts ... NaN NaN NaN NaN
19 NaN NaN NaN Fantasy Score ... NaN NaN NaN NaN
20 NBA 7.0 Houston Alperen Sengun ... NaN NaN NaN NaN
21 NaN NaN NaN Blks Stls ... NaN NaN NaN NaN
22 NBA 7.0 Los Angeles LeBron James\t ... NaN NaN NaN NaN
23 NBA 7.0 Denver Michael Porter Jr. ... NaN NaN NaN NaN
24 NaN NaN NaN Turnovers ... NaN NaN NaN NaN
25 NaN NaN NaN Pts Rebs ... NaN NaN NaN NaN
26 NaN NaN NaN Rebs Asts ... NaN NaN NaN NaN
27 NBA 7.0 Phoenix Devin Booker ... NaN NaN NaN NaN
28 NaN NaN NaN Rebounds ... NaN NaN NaN NaN
29 NBA 7.0 Denver Aaron Gordon ... NaN NaN NaN NaN
30 NBA 7.0 Los Angeles Anthony Davis ... NaN NaN NaN NaN
[31 rows x 12 columns]
Now writing to .csv in similar way should be successful.
CodePudding user response:
This is tricky one for a newbie. ot sure what data you actually want here, but you need to pull out the nested data within the json. Pandas can do that with json_normalize
import requests
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&per_page=250&single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'}
response = requests.get(url=pp_props_url, headers=headers).json()
df = pd.json_normalize(response,
record_path =['data'])
df.to_csv('player_props_20221030.csv')
Output:
print(df)
type ... relationships.stat_type.data.id
0 projection ... 14
1 projection ... 106
2 projection ... 22
3 projection ... 19
4 projection ... 245
.. ... ... ...
269 projection ... 23
270 projection ... 24
271 projection ... 21
272 projection ... 23
273 projection ... 23
[274 rows x 26 columns]