I am scraping data from a website using this code and loading the data to a pandas dataframe. I get multiple entries per iteration:
data = []
for i in range (0,24):
for j in range (1,15):
if i < 9:
URL = 'https://www.weltfussball.de/spielerliste/bundesliga-200' str(i) '-' '200' str(i 1)
URL_ = URL '/nach-name/' str(j) '/'
response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
data.append(pd.read_html(response.text)[1])
df = pd.concat(data).reset_index()
In order to identify the iteration I want to append/add to each row of the dataframe a column with the corresponding iterator i
. So, for the entries of the first iteration 0, then 1 and so on. How do I have to amend my code?
CodePudding user response:
Instead of using list to append dataframes, use a dictionary to store the dataframes for each iteration (i, j)
then concat
will automatically take care of adding the multiindex for you.
Update your code
data = {}
for i in range (0,2):
for j in range (1,3):
if i < 9:
...
data[(i, j)] = pd.read_html(response.text)[1]
df = pd.concat(data).reset_index(level=2, drop=True)