Home > OS >  How to add the scrape iterator to a pandas dataframe for each row?
How to add the scrape iterator to a pandas dataframe for each row?

Time:07-12

I am scraping data from a website using this code and loading the data to a pandas dataframe. I get multiple entries per iteration:

data = []
for i in range (0,24):
    for j in range (1,15):
        if i < 9:
            URL = 'https://www.weltfussball.de/spielerliste/bundesliga-200'   str(i)   '-'   '200'   str(i 1)
            URL_ = URL   '/nach-name/'  str(j)   '/'
            response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
            data.append(pd.read_html(response.text)[1])
df = pd.concat(data).reset_index()

In order to identify the iteration I want to append/add to each row of the dataframe a column with the corresponding iterator i. So, for the entries of the first iteration 0, then 1 and so on. How do I have to amend my code?

CodePudding user response:

Instead of using list to append dataframes, use a dictionary to store the dataframes for each iteration (i, j) then concat will automatically take care of adding the multiindex for you.

Update your code

data = {}
for i in range (0,2):
    for j in range (1,3):
        if i < 9:
            ...
            data[(i, j)] = pd.read_html(response.text)[1]

df = pd.concat(data).reset_index(level=2, drop=True) 
  • Related