Trying to scrape projections from PrizePicks-CodePudding

I am currently trying to streamline my projection model process by scraping the projections from PrizePicks. I keep running into an error that says Invalid Syntax. Any help would be greatly appreciated. Here is my code

import requests
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&per_page=250&single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'

}

response = requests.get(url=pp_props_url, headers=headers).json()
player_prop = response
player_prop
columns_list = [
"League",
"League_Id",
"Market",
"Name",
"Position",
"Team",
"Team_Name"
"Stat_Type"
"Line_Score",
"Points",
"Rebounds",
"Assists",
"Pts Rebs Asts",
"3-Pt Made"

]

pp_df = pd.DataFrame(player_prop, columns = columns_list)
pp_df.to_csv('player_props_20221030.csv', index=False)

When I open the csv the only thing printed are the column headers. BRAND NEW to this web scraping deal, so I am truly grateful for any help.

CodePudding user response：

Your questions was troublesome for me. First of all, I got prompted errors on retrieving the .json file. These were solved after removing 'Accept-Encoding': 'gzip, deflate, br' from your headers. After this indeed a valid .json file is returned, but perhaps this was already the case for you.

Then after retrieving this .json file, I can see that many of your columns_list headers are not in the actual .json file. Take for example League, which is defined as league in the .json file. Capitals matter.

Furthermore, the .json file renders all players of interest under the section included, so you can retrieve them as such: player_prop["included"]. Then for every player, you are interested in the attributes section so we need to filter them as such: [i["attributes"] for i in player_prop["included"]].

Now you can create your pd.DataFrame, and you will see that for every player a row is created. I have only changed for a couple of column headers, and some that you placed seem to not be in the .json file so it would be good to check (3-PT Made seems to be a value of name for example), but it returns the data as expected:

>>> pd.DataFrame([i["attributes"] for i in player_prop["included"]], columns=columns_list)
   league  league_id       market                name  ... rebounds assists  pts rebs asts  3-pt made
0     NBA        7.0      Houston         Jalen Green  ...      NaN     NaN            NaN        NaN
1     NBA        7.0      Phoenix          Chris Paul  ...      NaN     NaN            NaN        NaN
2     NBA        7.0      Phoenix     Cameron Johnson  ...      NaN     NaN            NaN        NaN
3     NaN        NaN          NaN       Blocked Shots  ...      NaN     NaN            NaN        NaN
4     NaN        NaN          NaN                 NBA  ...      NaN     NaN            NaN        NaN
5     NaN        NaN          NaN       Fantasy Score  ...      NaN     NaN            NaN        NaN
6     NBA        7.0      Houston    Kevin Porter Jr.  ...      NaN     NaN            NaN        NaN
7     NBA        7.0      Phoenix       Mikal Bridges  ...      NaN     NaN            NaN        NaN
8     NaN        NaN          NaN         Single Stat  ...      NaN     NaN            NaN        NaN
9     NaN        NaN          NaN           3-PT Made  ...      NaN     NaN            NaN        NaN
10    NaN        NaN          NaN             Assists  ...      NaN     NaN            NaN        NaN
11    NaN        NaN          NaN              Points  ...      NaN     NaN            NaN        NaN
12    NBA        7.0       Denver        Nikola Jokic  ...      NaN     NaN            NaN        NaN
13    NBA        7.0       Denver        Jamal Murray  ...      NaN     NaN            NaN        NaN
14    NBA        7.0  Los Angeles    Lonnie Walker IV  ...      NaN     NaN            NaN        NaN
15    NBA        7.0  Los Angeles    Patrick Beverley  ...      NaN     NaN            NaN        NaN
16    NaN        NaN          NaN              Steals  ...      NaN     NaN            NaN        NaN
17    NaN        NaN          NaN       Pts Rebs Asts  ...      NaN     NaN            NaN        NaN
18    NaN        NaN          NaN            Pts Asts  ...      NaN     NaN            NaN        NaN
19    NaN        NaN          NaN       Fantasy Score  ...      NaN     NaN            NaN        NaN
20    NBA        7.0      Houston      Alperen Sengun  ...      NaN     NaN            NaN        NaN
21    NaN        NaN          NaN           Blks Stls  ...      NaN     NaN            NaN        NaN
22    NBA        7.0  Los Angeles      LeBron James\t  ...      NaN     NaN            NaN        NaN
23    NBA        7.0       Denver  Michael Porter Jr.  ...      NaN     NaN            NaN        NaN
24    NaN        NaN          NaN           Turnovers  ...      NaN     NaN            NaN        NaN
25    NaN        NaN          NaN            Pts Rebs  ...      NaN     NaN            NaN        NaN
26    NaN        NaN          NaN           Rebs Asts  ...      NaN     NaN            NaN        NaN
27    NBA        7.0      Phoenix        Devin Booker  ...      NaN     NaN            NaN        NaN
28    NaN        NaN          NaN            Rebounds  ...      NaN     NaN            NaN        NaN
29    NBA        7.0       Denver        Aaron Gordon  ...      NaN     NaN            NaN        NaN
30    NBA        7.0  Los Angeles       Anthony Davis  ...      NaN     NaN            NaN        NaN

[31 rows x 12 columns]

Now writing to .csv in similar way should be successful.

CodePudding user response：

This is tricky one for a newbie. ot sure what data you actually want here, but you need to pull out the nested data within the json. Pandas can do that with json_normalize

import requests
import pandas as pd
pp_props_url = 'https://api.prizepicks.com/projections?league_id=7&per_page=250&single_stat=true'
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json; charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
'Access-Control-Allow-Credentials': 'true',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://app.prizepicks.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9'}

response = requests.get(url=pp_props_url, headers=headers).json()
df = pd.json_normalize(response,
                       record_path =['data'])

df.to_csv('player_props_20221030.csv')

Output:

print(df)
           type  ... relationships.stat_type.data.id
0    projection  ...                              14
1    projection  ...                             106
2    projection  ...                              22
3    projection  ...                              19
4    projection  ...                             245
..          ...  ...                             ...
269  projection  ...                              23
270  projection  ...                              24
271  projection  ...                              21
272  projection  ...                              23
273  projection  ...                              23

[274 rows x 26 columns]