Using the code below I am able to scrape data from a special website for football players of Bundesliga:
data = []
for i in range (1,12):
URL = 'https://www.weltfussball.de/spielerliste/bundesliga-2021-2022/nach-name/'
URL_ = URL str(i) '/'
r = Request(URL_, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
#r = urllib.request.urlopen(URL_).read()
soup = BeautifulSoup(webpage,'lxml')
What I want to get is a pandas dataframe in which contains information of the players on this website, i.e. name, team, birth date, height and position. What I am now wondering about is the elements of the website that I have to specifically scrape?
CodePudding user response:
Assuming you use pandas
for creating your dataframe
- simply scrape table via pandas.read_html()
, append it to your data and concat the dataframes:
import pandas as pd
import requests
data = []
for i in range (1,12):
URL = 'https://www.weltfussball.de/spielerliste/bundesliga-2021-2022/nach-name/'
URL_ = URL str(i) '/'
response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
data.append(pd.read_html(response.text)[1])
pd.concat(data).reset_index()