Using: Python in Google Collab
Thanks in Advance:
I have run this code on other data I have scraped FBREF, so I am unsure why it's happening now. The only difference is the way I scraped it.
The first time I scraped it:
url_link = 'https://fbref.com/en/comps/Big5/gca/players/Big-5-European-Leagues-Stats'
The second time I scraped it:
url = 'https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats'
html_content = requests.get(url).text.replace('<!--', '').replace('-->', '')
df = pd.read_html(html_content)
I then convert the data from object to float so I can do a calculation, after I have pulled it into my dataframe:
dfstandard['90s'] = dfstandard['90s'].astype(float)
dfstandard['Gls'] = dfstandard['Gls'].astype(float)
I look and it shows they are both floats:
10 90s 743 non-null float64
11 Gls 743 non-null float64
But when I run the code that as worked previously:
dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
I get the error message "TypeError: '<' not supported between instances of 'str' and 'int'"
I am fairly new to scraping, I'm stuck and don't know what to do next.
The full error message is below:
<ipython-input-152-e0ab76715b7d> in <module>()
1 #turn data into p 90
----> 2 dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
3 dfstandard['Ast'] = dfstandard['Ast'] / dfstandard['90s']
4 dfstandard['G-PK'] = dfstandard['G-PK'] / dfstandard['90s']
5 dfstandard['PK'] = dfstandard['PK'] / dfstandard['90s']
8 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _outer_indexer(self, left, right)
261
262 def _outer_indexer(self, left, right):
--> 263 return libjoin.outer_join_indexer(left, right)```
264
265 _typ = "index"
pandas/_libs/join.pyx in pandas._libs.join.outer_join_indexer()
TypeError: '<' not supported between instances of 'str' and 'int'>
CodePudding user response:
There are two Gls
columns in your dataframe. I think you converted only one "Gls"
column to float, and when you do dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
, the other "Gls" column is getting considered?...
Try stripping whitespace from the column names too
df = df.rename(columns=lambda x: x.strip())
df['90s'] = pd.to_numeric(df['90s'], errors='coerce')
df['Gls'] = pd.to_numeric(df['Gls'], errors='coerce')
Thus the error.