TypeError: '<' not supported between instances of 'str' and 'int' a-CodePudding

Using: Python in Google Collab Thanks in Advance: I have run this code on other data I have scraped FBREF, so I am unsure why it's happening now. The only difference is the way I scraped it. The first time I scraped it: url_link = 'https://fbref.com/en/comps/Big5/gca/players/Big-5-European-Leagues-Stats'

The second time I scraped it:

url = 'https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats'

html_content = requests.get(url).text.replace('', '')

df = pd.read_html(html_content)

I then convert the data from object to float so I can do a calculation, after I have pulled it into my dataframe:

dfstandard['90s'] = dfstandard['90s'].astype(float) dfstandard['Gls'] = dfstandard['Gls'].astype(float)

I look and it shows they are both floats:

10 90s 743 non-null float64

11 Gls 743 non-null float64

But when I run the code that as worked previously:

dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']

I get the error message "TypeError: '<' not supported between instances of 'str' and 'int'"

I am fairly new to scraping, I'm stuck and don't know what to do next.

The full error message is below:

<ipython-input-152-e0ab76715b7d> in <module>()
      1 #turn data into p 90
----> 2 dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
      3 dfstandard['Ast'] = dfstandard['Ast'] / dfstandard['90s']
      4 dfstandard['G-PK'] = dfstandard['G-PK'] / dfstandard['90s']
      5 dfstandard['PK'] = dfstandard['PK'] / dfstandard['90s']

8 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _outer_indexer(self, left, right)
    261 
    262     def _outer_indexer(self, left, right):
--> 263         return libjoin.outer_join_indexer(left, right)```
    264 
    265     _typ = "index"

pandas/_libs/join.pyx in pandas._libs.join.outer_join_indexer()

TypeError: '<' not supported between instances of 'str' and 'int'>

CodePudding user response：

There are two Gls columns in your dataframe. I think you converted only one "Gls" column to float, and when you do dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s'], the other "Gls" column is getting considered?...

Try stripping whitespace from the column names too

df = df.rename(columns=lambda x: x.strip())
df['90s'] = pd.to_numeric(df['90s'], errors='coerce')
df['Gls'] = pd.to_numeric(df['Gls'], errors='coerce')

Thus the error.