I would like to scrape the data from a website "https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm"
I tried to extract these data on Python but I couldn't make it. I would like to create a table on Python with thoses data and the same fields. Can someone try to help me with the script of the data extraction using pandas, beautifulsoup... ?
I already tried this :
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
print(soup.prettify())
for i in soup.find_all("tr"):
print(i.find_all("td"))
print ("")
colonnes = ["Nom","Equipe","Buts","Matchs joués"]
df = pd.DataFrame(columns = colonnes)
df
for i in soup.find_all("tr")[1:]:
href = i.find_all("td")
df = df.append({'Nom': href}, ignore_index=True)
print(df.head())
CodePudding user response:
There is an much simpler way to grab the data and put it in a dataframe - Use pandas.read_html
and make your adjustments if needed with pandas.
df = pd.read_html('https://www.maxifoot.fr/classement-buteur-europe-annee-civile-2021.htm', match = 'min/but*')[1]
df['href'] = df["Joueur"].apply(lambda x: 'https://www.maxifoot.fr' soup.select_one(f'a:-soup-contains("{x}")')['href'])
If you like to solve it with BeautifulSoup, there are some adjustments you have to do:
...
data = []
for row in soup.select('.butd1 tr')[1:]:
strings = list(row.stripped_strings)
strings[3:5] = [''.join(strings[3:5])]
strings[6:8] = [''.join(strings[6:8])]
a = 'https://www.maxifoot.fr' row.a['href']
strings.append(a)
data.append(strings)
colonnes = ['Pos','Nom','Equipe','Buts','dontchamp.','Matchs joués','min/but*','href']
pd.DataFrame(data,columns = colonnes)
Output
Pos | Nom | Equipe | Buts | dontchamp. | Matchs joués | min/but* | href |
---|---|---|---|---|---|---|---|
1. | R. LEWANDOWSKI | Bayern Munich | 58 (11 p.) | 43 | 47 (1,23 b/m) | 68' | https://www.maxifoot.fr/joueur/robert-lewandowski-13191.htm |
2. | E. HAALAND | Borussia Dortmund | 43 (6 p.) | 30 | 43 (1,00 b/m) | 85' | https://www.maxifoot.fr/joueur/erling-haland-190157.htm |
. | K. MBAPPÉ | Paris SG | 43 (7 p.) | 24 | 53 (0,81 b/m) | 104' | https://www.maxifoot.fr/joueur/kylian-mbappe-lottin-183802.htm |
4. | K. BENZEMA | Real Madrid | 38 (3 p.) | 30 | 50 (0,76 b/m) | 109' | https://www.maxifoot.fr/joueur/karim-benzema-10476.htm |
5. | MOHAMED SALAH | Liverpool | 37 (4 p.) | 24 | 53 (0,70 b/m) | 122' | https://www.maxifoot.fr/joueur/mohamed-salah-59580.htm |
...