Home > other >  How to scrape specific information on a website with BeautifulSoup?
How to scrape specific information on a website with BeautifulSoup?

Time:07-11

Using the code below I am able to scrape data from a special website for football players of Bundesliga:

data = []
for i in range (1,12):
    URL = 'https://www.weltfussball.de/spielerliste/bundesliga-2021-2022/nach-name/'
    URL_ = URL   str(i)   '/'
    r = Request(URL_, headers={'User-Agent': 'Mozilla/5.0'})
    webpage = urlopen(req).read()
    #r = urllib.request.urlopen(URL_).read()
    soup = BeautifulSoup(webpage,'lxml')

What I want to get is a pandas dataframe in which contains information of the players on this website, i.e. name, team, birth date, height and position. What I am now wondering about is the elements of the website that I have to specifically scrape?

CodePudding user response:

Assuming you use pandas for creating your dataframe - simply scrape table via pandas.read_html(), append it to your data and concat the dataframes:

import pandas as pd
import requests

data = []
for i in range (1,12):
    URL = 'https://www.weltfussball.de/spielerliste/bundesliga-2021-2022/nach-name/'
    URL_ = URL   str(i)   '/'
    response = requests.get(URL_,headers={'User-Agent': 'Mozilla/5.0'})
    data.append(pd.read_html(response.text)[1])
pd.concat(data).reset_index()
  • Related