Home > other >  Scraping web Football Data
Scraping web Football Data

Time:01-01

I would like to scrape the data from a website "https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm"

I tried to extract these data on Python but I couldn't make it. I would like to create a table on Python with thoses data and the same fields. Can someone try to help me with the script of the data extraction using pandas, beautifulsoup... ?

I already tried this :

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm'
r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')
print(soup.prettify())
for i in soup.find_all("tr"):
    print(i.find_all("td"))
    print ("")
    colonnes = ["Nom","Equipe","Buts","Matchs joués"]
    df = pd.DataFrame(columns = colonnes)
df

for i in soup.find_all("tr")[1:]:
    href = i.find_all("td")
    df = df.append({'Nom': href}, ignore_index=True)

print(df.head())

CodePudding user response:

There is an much simpler way to grab the data and put it in a dataframe - Use pandas.read_html and make your adjustments if needed with pandas.

df = pd.read_html('https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm', match = 'min/but*')[1]

df['href'] = df["Joueur"].apply(lambda x: 'https://www.maxifoot.fr' soup.select_one(f'a:-soup-contains("{x}")')['href'])

If you like to solve it with BeautifulSoup, there are some adjustments you have to do:

...
data = []
for row in soup.select('.butd1 tr')[1:]:
    strings = list(row.stripped_strings)
    strings[3:5] = [''.join(strings[3:5])]
    strings[6:8] = [''.join(strings[6:8])]
    a = 'https://www.maxifoot.fr' row.a['href']
    strings.append(a)
    data.append(strings)
colonnes = ['Pos','Nom','Equipe','Buts','dontchamp.','Matchs joués','min/but*','href']
pd.DataFrame(data,columns = colonnes)

Output

Pos Nom Equipe Buts dontchamp. Matchs joués min/but* href
1. R. LEWANDOWSKI Bayern Munich 58 (11 p.) 43 47 (1,23 b/m) 68' https://www.maxifoot.fr/joueur/robert-lewandowski-13191.htm
2. E. HAALAND Borussia Dortmund 43 (6 p.) 30 43 (1,00 b/m) 85' https://www.maxifoot.fr/joueur/erling-haland-190157.htm
. K. MBAPPÉ Paris SG 43 (7 p.) 24 53 (0,81 b/m) 104' https://www.maxifoot.fr/joueur/kylian-mbappe-lottin-183802.htm
4. K. BENZEMA Real Madrid 38 (3 p.) 30 50 (0,76 b/m) 109' https://www.maxifoot.fr/joueur/karim-benzema-10476.htm
5. MOHAMED SALAH Liverpool 37 (4 p.) 24 53 (0,70 b/m) 122' https://www.maxifoot.fr/joueur/mohamed-salah-59580.htm

...

  • Related