I have scraped data from a webpage and now I want to visualise that data. When I'm trying to scatter I get the error "NameError: name 'x' is not defined" at the plt.scatter(data[x],data[y])
. I've tried to look over the codes and data I'm scraping from the website, and looked over my own code. Not sure why x
and y
won't work. Any solutions?
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
from pandas.core.indexes.base import Index
text_color = 'w'
data = pd.read_csv(#filename)
fig, ax = plt.subplots(figsize=(13,8.5)) #lager figurene
fig.set_facecolor('#22312b')
ax.patch.set_facecolor('#22312b')
pitch = Pitch(pitch_color='#aabb97', line_color='white')
pitch.draw(ax=ax)
plt.scatter(data[x],data[y])
The csv file I'm reading my data from is this:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
base_url = 'https://understat.com/match/'
match = input('Please enter the match id: ')
url = base_url match
res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
scripts = soup.find_all('script')
strings = scripts[1].string
ind_start = strings.index("('") 2
ind_end = strings.index("')")
json_data = strings[ind_start:ind_end]
json_data = json_data.encode('utf8').decode('unicode_escape')
data = json.loads(json_data)
team = []
minute = []
xg = []
result = []
x = []
y = []
situation = []
player = []
data_away = data['a']
data_home = data['h']
for index in range(len(data_home)):
for key in data_home[index]:
if key == 'X':
x.append(data_home[index][key])
if key == 'Y':
y.append(data_home[index][key])
if key == 'xG':
xg.append(data_home[index][key])
if key == 'h_team':
team.append(data_home[index][key])
if key == 'result':
result.append(data_home[index][key])
if key == 'situation':
situation.append(data_home[index][key])
if key == 'minute':
minute.append(data_home[index][key])
if key == 'player':
player.append(data_home[index][key])
for index in range(len(data_away)):
for key in data_away[index]:
if key == 'X':
x.append(data_away[index][key])
if key == 'Y':
y.append(data_away[index][key])
if key == 'xG':
xg.append(data_away[index][key])
if key == 'a_team':
team.append(data_away[index][key])
if key == 'result':
result.append(data_away[index][key])
if key == 'situation':
situation.append(data_away[index][key])
if key == 'minute':
minute.append(data_away[index][key])
if key == 'player':
player.append(data_away[index][key])
col_names = ['Minute','Player','Situation','Team','xG','Result','x-coordinate','y-coordinate']
df = pd.DataFrame([minute,player,situation,team,xg,result,x,y], index=col_names)
df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T
Here is my dataframe
Unnamed: 0 0 1 ... 30 31 32
0 Minute 8 10 ... 78 79 86
1 Player Cristiano Ronaldo Cristiano Ronaldo ... Allan Saint-Maximin Joe Willock Joelinton
2 Situation OpenPlay OpenPlay ... OpenPlay FromCorner OpenPlay
3 Team Manchester United Manchester United ... Newcastle United Newcastle United Newcastle United
4 xG 0.05710771679878235 0.03967716544866562 ... 0.020885728299617767 0.013165773823857307 0.05987533554434776
5 Result MissedShots MissedShots ... BlockedShot SavedShot MissedShots
6 x-coordinate 0.9780000305175781 0.9719999694824218 ... 0.7390000152587891 0.705999984741211 0.9119999694824219
7 y-coordinate 0.33799999237060546 0.72 ... 0.47900001525878905 0.4640000152587891 0.5929999923706055
Error message
File "C:\Users\#name\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'x-coordinate'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "#filename", line 19, in <module>
plt.scatter(data['x-coordinate'],data['y-coordinate'])
File "#filename", line 2899, in __getitem__
indexer = self.columns.get_loc(key)
File "#filename", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'x-coordinate'
CodePudding user response:
Just because a variable is defined in a file that you run doesn't mean it's automatically available in another file that you run later. You need to pass them somehow, such as the second file import
ing and calling a function in the first file that return
s the values you're looking for.
However, the solution to this particular issue is a lot easier. In your plotter file, simply change
plt.scatter(data[x],data[y])
to
plt.scatter(data["x-coordinate"],data["y-coordinate"])
This uses the data in the named columns of the dataframe, which is really what you want.
EDIT
The fix above would work, but for one simple problem at the end of the scraping code:
df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T
You are saving the df to CSV, and then transposing it. Switch those two lines, use my code above in the plotting file, and you should be all set. I don't have mplsoccer
installed, so I just commented out those lines.
df
should look like the following example, created using id 14620
# display(df.head())
Minute Player Situation Team xG Result x-coordinate y-coordinate
0 13 Roberto Firmino OpenPlay Liverpool 0.03234297037124634 BlockedShot 0.774000015258789 0.43
1 13 Andrew Robertson OpenPlay Liverpool 0.03856334835290909 MissedShots 0.8830000305175781 0.6880000305175781
2 16 Roberto Firmino OpenPlay Liverpool 0.07978218793869019 MissedShots 0.835 0.509000015258789
3 20 Xherdan Shaqiri OpenPlay Liverpool 0.04507734999060631 BlockedShot 0.7919999694824219 0.48900001525878906
4 21 Roberto Firmino OpenPlay Liverpool 0.09094344824552536 BlockedShot 0.9009999847412109 0.639000015258789