I am trying to scrape data(names,ages,teams) from this website-https://sofifa.com/players?offset=0. While I was trying to find the relevent data using soup.findAll(), I am getting an empty list.
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
k=[]
url="https://sofifa.com/players?offset=0"
resp=requests.get(url)
soup=BeautifulSoup(resp.content,'lxml')
for omk in soup.find_all('><div class="bp3-text-overflow-ellipsis">'):
k.append(str(omk))
print(k)
I read some answers which had mentioned about tags and class but I don't know about these are.
CodePudding user response:
According to your question, here is an example of working solution:
Code:
import pandas as pd
import re
import requests
from bs4 import BeautifulSoup
k = []
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
url = "https://sofifa.com/players?offset=0"
resp = requests.get(url, headers = headers)
soup = BeautifulSoup(resp.content, 'lxml')
for omk in soup.select('table.table.table-hover.persist-area tbody tr'):
name = omk.select_one('td.col-name a:nth-child(1) div').get_text(strip=True)
print(name)
Output:
M. Sarr
É. Mendy
P. Daka
F. Wirtz
J. Timber
C. De Ketelaere
Cristiano Ronaldo
D. Maldini
J. Bellingham
Lucas Paquetá
Gavi
Antony
A. Spörle
K. Adeyemi
E. Haaland
D. Kamada
M. Salah
N. Madueke
A. Tchouaméni
M. Greenwood
M. Lacroix
R. Gravenberch
Pedri
J. Gvardiol
N. Lang
Raphinha
A. Hložek
J. Musiala
F. Chiesa
L. Messi
B. Brereton Díaz
R. Cherki
D. Vlahović
Ansu Fati
Pedro Benito
G. Raspadori
Yeremy Pino
Y. Tielemans
K. Mbappé
E. Camavinga
D. Scarlett
A. Bastoni
J. Sancho
T. Hernández
A. Davies
J. Koundé
A. Saint-Maximin
H. Elliott
S. Tonali
A. Broja
A. Isak
M. Vandevoordt
P. Foden
F. Kessié
J. Doku
E. Tapsoba
K. Mitoma
Luiz Felipe
Nuno Mendes
S. Dest
CodePudding user response:
There are a couple issues with your code snippet.
The first is that you need to specify an HTML parser when instantiating your BeautifulSoup
instance:
soup=BeautifulSoup(resp.content,'html.parser')
Then, when searching for a div
element with a class of bp3-text-overflow-ellipsis
, the proper syntax is the following:
soup.find_all("div", class_="bp3-text-overflow-ellipsis")
Here is the documentation related to find_all
: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all