Home > Enterprise >  Inconsistency with defined function
Inconsistency with defined function

Time:08-03

I was trying to define a new function and I found that it doesn't work as intended.

This what I wrote:

def fbref(stats):
    base_url = "https://fbref.com/en/comps/12/"
    iterate = stats
    end_url = "/La-Liga-Stats"
    response = requests.get(base_url   iterate   end_url)

    soup = BeautifulSoup(response.text, 'html.parser')

    comments = soup.find_all(string=lambda text: isinstance(text, Comment))

    df = []
    for each in comments:
        if 'table' in each:
            try:
                df.append(pd.read_html(each, header=1)[0])
            except:
                continue

    df = df[0]
    df = df[df.Player != "Player"]
    df = df.fillna(0)
    df.iloc[:, 5:-1] = df.iloc[:, 5:-1].apply(pd.to_numeric, axis = 1)
    return df

It works fine, but when the function is called twice or more times in succession it says "list index out of range." For example, if I write gca = fbref("gca"), defense = fbref("defense"), possession = fbref("possession"), passing = fbref("passing"), stats = fbref("stats"), shooting = fbref("shooting"), misc = fbref("misc") I get "gca", "defense" and sometimes also "possession", but after that it gives me the error. I tried several combinations and same behaviour, so it's not about the order.

Does anyone have a clue of what may be happening? Thank you for reading this.

I use spyder and python 3.9

CodePudding user response:

As I wrote in the comments, your issue is due to a rate limiting mechanism.

Since you're parsing a website, there's no documentation for rate. Should there be - I suggest reading the documentation of the target website and checking for rate.

Sometimes websites return the rate or any other useful information in the headers, so check the response headers and response code.

Sleep is not always the best solution. Once you know the rate, I suggest using a rate limiting library, maybe together with asyncio or threading.

Oh and print the exception in that try: except:, it'll only be helpful. Good luck!

  • Related