Home > front end >  Pyplot scatter name not defined
Pyplot scatter name not defined

Time:09-17

I have scraped data from a webpage and now I want to visualise that data. When I'm trying to scatter I get the error "NameError: name 'x' is not defined" at the plt.scatter(data[x],data[y]). I've tried to look over the codes and data I'm scraping from the website, and looked over my own code. Not sure why x and y won't work. Any solutions?

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from mplsoccer.pitch import Pitch
from pandas.core.indexes.base import Index 

text_color = 'w'

data = pd.read_csv(#filename)

fig, ax = plt.subplots(figsize=(13,8.5)) #lager figurene

fig.set_facecolor('#22312b')

ax.patch.set_facecolor('#22312b')

pitch = Pitch(pitch_color='#aabb97', line_color='white')

pitch.draw(ax=ax)

plt.scatter(data[x],data[y])

The csv file I'm reading my data from is this:

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

base_url = 'https://understat.com/match/'
match = input('Please enter the match id: ')
url = base_url   match

res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
scripts = soup.find_all('script')

strings = scripts[1].string

ind_start = strings.index("('") 2

ind_end = strings.index("')")

json_data = strings[ind_start:ind_end]
json_data = json_data.encode('utf8').decode('unicode_escape')

data = json.loads(json_data)

team = []
minute = []
xg = []
result = []
x = []
y = []
situation = []
player = []

data_away = data['a']
data_home = data['h']

for index in range(len(data_home)):
    for key in data_home[index]:
        if key == 'X':
            x.append(data_home[index][key])
        if key == 'Y':
            y.append(data_home[index][key])
        if key == 'xG':
            xg.append(data_home[index][key])
        if key == 'h_team':
            team.append(data_home[index][key])
        if key == 'result':
            result.append(data_home[index][key])
        if key == 'situation':
            situation.append(data_home[index][key])
        if key == 'minute':
            minute.append(data_home[index][key])
        if key == 'player':
            player.append(data_home[index][key])

for index in range(len(data_away)):
    for key in data_away[index]:
        if key == 'X':
            x.append(data_away[index][key])
        if key == 'Y':
            y.append(data_away[index][key])
        if key == 'xG':
            xg.append(data_away[index][key])
        if key == 'a_team':
            team.append(data_away[index][key])
        if key == 'result':
            result.append(data_away[index][key])
        if key == 'situation':
            situation.append(data_away[index][key])
        if key == 'minute':
            minute.append(data_away[index][key])
        if key == 'player':
            player.append(data_away[index][key])

col_names = ['Minute','Player','Situation','Team','xG','Result','x-coordinate','y-coordinate']
df = pd.DataFrame([minute,player,situation,team,xg,result,x,y], index=col_names)
df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T

Here is my dataframe

         Unnamed: 0                    0                    1  ...                    30                    31                   32
0        Minute                    8                   10  ...                    78                    79                   86
1        Player    Cristiano Ronaldo    Cristiano Ronaldo  ...   Allan Saint-Maximin           Joe Willock            Joelinton
2     Situation             OpenPlay             OpenPlay  ...              OpenPlay            FromCorner             OpenPlay
3          Team    Manchester United    Manchester United  ...      Newcastle United      Newcastle United     Newcastle United
4            xG  0.05710771679878235  0.03967716544866562  ...  0.020885728299617767  0.013165773823857307  0.05987533554434776
5        Result          MissedShots          MissedShots  ...           BlockedShot             SavedShot          MissedShots
6  x-coordinate   0.9780000305175781   0.9719999694824218  ...    0.7390000152587891     0.705999984741211   0.9119999694824219
7  y-coordinate  0.33799999237060546                 0.72  ...   0.47900001525878905    0.4640000152587891   0.5929999923706055

Error message

File "C:\Users\#name\AppData\Local\Programs\PythonCodingPack\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'x-coordinate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "#filename", line 19, in <module>
    plt.scatter(data['x-coordinate'],data['y-coordinate'])
  File "#filename", line 2899, in __getitem__
    indexer = self.columns.get_loc(key)
  File "#filename", line 2891, in get_loc
    raise KeyError(key) from err
KeyError: 'x-coordinate'

CodePudding user response:

Just because a variable is defined in a file that you run doesn't mean it's automatically available in another file that you run later. You need to pass them somehow, such as the second file importing and calling a function in the first file that returns the values you're looking for.

However, the solution to this particular issue is a lot easier. In your plotter file, simply change

plt.scatter(data[x],data[y])

to

plt.scatter(data["x-coordinate"],data["y-coordinate"])

This uses the data in the named columns of the dataframe, which is really what you want.


EDIT

The fix above would work, but for one simple problem at the end of the scraping code:

df.to_csv('shotmaps.csv', encoding='utf-8')
df = df.T

You are saving the df to CSV, and then transposing it. Switch those two lines, use my code above in the plotting file, and you should be all set. I don't have mplsoccer installed, so I just commented out those lines.

  • df should look like the following example, created using id 14620
# display(df.head())

  Minute            Player Situation       Team                   xG       Result        x-coordinate         y-coordinate
0     13   Roberto Firmino  OpenPlay  Liverpool  0.03234297037124634  BlockedShot   0.774000015258789                 0.43
1     13  Andrew Robertson  OpenPlay  Liverpool  0.03856334835290909  MissedShots  0.8830000305175781   0.6880000305175781
2     16   Roberto Firmino  OpenPlay  Liverpool  0.07978218793869019  MissedShots               0.835    0.509000015258789
3     20   Xherdan Shaqiri  OpenPlay  Liverpool  0.04507734999060631  BlockedShot  0.7919999694824219  0.48900001525878906
4     21   Roberto Firmino  OpenPlay  Liverpool  0.09094344824552536  BlockedShot  0.9009999847412109    0.639000015258789
  • Related