I need to retrieve the country name in each rider's page. Sometimes this code works and sometimes it doesn't (soup.find()
return None). Why?
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time
names = ['Fabio Di Giannantonio', 'Francesco Bagnaia']
for name in names:
driver = webdriver.Chrome("/usr/bin/chromedriver")
driver.get(f"https://www.motogp.com/en/riders/profile/{name}")
soup = BeautifulSoup(driver.page_source)
print(soup.find("p", "card-text c-rider-country").get_text())
time.sleep(30)
driver.close()
CodePudding user response:
I'm not familiar with BeautifulSoup
so I'll give a Selenium solution.
With Selenium your code is missing a wait
- you need to wait for element to be fully rendered before extracting it's text.
The best practice to do that with Selenium is to use WebDriverWait
, as following:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options)
wait = WebDriverWait(driver, 10)
names = ['Fabio Di Giannantonio', 'Francesco Bagnaia']
for name in names:
driver.get(f"https://www.motogp.com/en/riders/profile/{name}")
title =wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.card-text.c-rider-country"))).text
print(title)
The output is stable:
ITALY
ITALY
I run this code multiple times
CodePudding user response:
I suspect you are approaching this problem the wrong way. Selenium is a tool used for testing: it should be the last call for a web scraping task, if all else would fail.
The information in those pages is being hydrated from an API enddpoint, via javascript XHR calls. You can scrape directly that API, as long as you provide the correct information, that being name/surname.
Here is an example where I'm looking for all drivers with names Fabio
and Francesco
:
import requests
import pandas as pd
drivers_df = pd.DataFrame()
s = requests.Session()
drivers = ['Fabio', 'Francesco']
for d in drivers:
r = s.get(f'https://api.motogp.com/riders-api/riders?name={d}')
df = pd.json_normalize(r.json())
drivers_df = pd.concat([drivers_df, df], axis=0, ignore_index=True)
print(drivers_df)
Result in terminal:
id name surname nickname current_career_step birth_city birth_date years_old published legacy_id country.iso country.name current_career_step.season current_career_step.number current_career_step.sponsored_team current_career_step.team current_career_step.category.id current_career_step.category.name current_career_step.category.legacy_id current_career_step.in_grid current_career_step.short_nickname current_career_step.current current_career_step.pictures.profile.main current_career_step.pictures.profile.secondary current_career_step.pictures.bike.main current_career_step.pictures.bike.secondary current_career_step.pictures.helmet.main current_career_step.pictures.helmet.secondary current_career_step.pictures.number current_career_step.pictures.portrait current_career_step.team.id current_career_step.team.constructor.id current_career_step.team.constructor.name current_career_step.team.constructor.legacy_id current_career_step.team.name current_career_step.team.legacy_id current_career_step.team.color current_career_step.team.text_color current_career_step.team.picture current_career_step.team.published
0 1ea1f811-7505-43b0-8225-e1a325a2d1e1 Fabio Meozzi None NaN None None NaN True 6639 IT Italy NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2d7c92c8-6ec6-4760-978e-5a64023fa811 Fabio Nucci None NaN Arenzzo None NaN True 5559 IT Italy NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 35a3128b-80f8-4349-aa3d-3411c155fa9a Fabio Biliotti None NaN None None NaN True 726 IT Italy 1989.0 NaN None NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370 500cc -1.0 True None False None NaN None NaN None NaN None https://www.motogp.com/en/api/rider/photo/grid/old/726 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 3b489143-ee09-4be2-a224-5a43a0a93e4f Fabio Spiranelli None NaN Lodi 1999-12-05 22.0 True 8825 IT Italy 2016.0 3.0 CIP-Unicom Starker NaN 1ab203aa-e292-4842-8bed-971911357af1 Moto3 1.0 True None False None NaN https://photos.motogp.com/2016/riders/moto3/bike/original/rider_8825_1458823989.jpg NaN https://photos.motogp.com/2016/riders/moto3/helmet/original/rider_8825_1458821857.jpg NaN None https://photos.motogp.com/2016/riders/moto3/grid/original/rider_8825_1458821693.jpg 5a88fee3-0f04-45f9-ba64-3fd6d2a62d73 5ecd8db7-d87b-4b3e-87b6-1f72ee457ede KTM 298.0 CIP-Unicom Starker 110.0 None None https://photos.motogp.com/2020/teams/moto3/original/team_bike_110_1584101124.jpg True
4 3de48564-be7d-4c34-8f85-4edeea23a313 Fabio Carpani None NaN Padenghe Garda 1975-08-23 47.0 True 2420 IT Italy 1998.0 NaN None NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370 500cc -1.0 True None False None NaN None NaN None NaN None https://www.motogp.com/en/api/rider/photo/grid/old/2420 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 3f845fa3-693f-47ae-b6ce-d8804b2b4909 Fabio Barchitta None NaN None None NaN True 717 IT Italy 1988.0 NaN None NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370 500cc -1.0 True None False None NaN None NaN None NaN None https://www.motogp.com/en/api/rider/photo/grid/old/717 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 525b1551-f10b-4cfd-9b43-59af6fca654b Fabio Di Giannantonio None NaN Roma 1998-10-10 23.0 True 8539 IT Italy 2022.0 49.0 Gresini Racing MotoGP™ NaN 737ab122-76e1-4081-bedb-334caaa18c70 MotoGP 3.0 True FD49 False https://photos.motogp.com/riders/a/0/a04438ea-4e12-47f6-bb08-ba4589ea3665/profile/main/49-Fabio-DiGiananntonioRider_DS_5200.png NaN https://photos.motogp.com/riders/a/1/a12c0f24-05ee-4983-9407-f05fbdf7c67c/bike/main/49_Fabio_Di_Giannantonio.png NaN https://photos.motogp.com/riders/a/d/ad1917dc-aaf4-4038-b5e8-82c6b83ebb23/helmet/main/49-Fabio-Diggianantonio.jpg NaN None https://photos.motogp.com/riders/a/0/a04438ea-4e12-47f6-bb08-ba4589ea3665/portrait/49-Fabio-DiGiannantonio-Rider_DS_5192.jpg 11729e67-d2cb-41ad-b3a8-4a0ac5768a5f 38af1078-e2f1-4399-811c-1e98cf6f6150 Ducati 110.0 Gresini Racing MotoGP™ 10.0 #a1b7e5 #323232 https://photos.motogp.com/teams/6/6/66af5d2c-8d52-4099-988c-981983046476/GresiniRacing_.png True
7 68c035ae-4d49-4f2c-a342-c0f0be21d964 Fabio Bitocchi None NaN None None NaN True 9948 IT Italy NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 9042979e-cca9-42cb-a17e-56da52a5fb3b Fabio Frankenberger None NaN None None NaN True 7842 DE Germany NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 bf95d959-6a60-44f1-84b5-ded861e62578 Fabio Quartararo None NaN Nice 1999-04-20 23.0 True 8520 FR France 2022.0 20.0 Monster Energy Yamaha MotoGP™ NaN 737ab122-76e1-4081-bedb-334caaa18c70 MotoGP 3.0 True FQ20 False https://photos.motogp.com/riders/6/9/69b5c348-2840-4dc1-bf7b-457c0683222c/profile/main/20-Fabio-Quartararo.png NaN https://photos.motogp.com/riders/d/0/d088b244-b3c1-4f06-af3c-e122613e2b8b/bike/main/_0003_20-Fabio-Quartararo-Bike-MotoGPDSC04216.png NaN https://photos.motogp.com/riders/b/b/bbe8044a-2fbf-448c-bfad-6d65e596b06c/helmet/main/20-Fabio-Quartararo.jpg NaN https://photos.motogp.com/riders/9/a/9ac4314f-fd8d-433f-876d-515ee1631c28/number/20_Fabio_Quartararo.png https://photos.motogp.com/riders/6/9/69b5c348-2840-4dc1-bf7b-457c0683222c/portrait/20_Fabio_Quartararo.jpg 141b6f0f-7e53-4d27-9bdb-0ea8fba7e842 f2e835be-7fab-4782-a26b-de3d583d132c Yamaha 3.0 Monster Energy Yamaha MotoGP™ 19.0 #183dc7 #ffffff https://photos.motogp.com/teams/9/1/91699bb4-f33d-40de-b995-4f5e120ff74d/Yamaha.png True
10 2e893359-7f93-4b55-9c54-6016337c8e80 Francesco Pellegrino None NaN None 1964-06-09 58.0 True 2567 VE Venezuela (Bolivarian Republic of) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 66b78301-5826-4986-b11e-fa68a7bd77a7 Francesco Bagnaia None NaN Torino 1997-01-14 25.0 True 8273 IT Italy 2022.0 63.0 Ducati Lenovo Team NaN 737ab122-76e1-4081-bedb-334caaa18c70 MotoGP 3.0 True FB63 False https://photos.motogp.com/riders/e/a/eac63974-aeee-4f62-81a4-f9588a47009d/profile/main/63_Francesco_Bagnaia.png NaN https://photos.motogp.com/riders/4/e/4e947398-047a-44c5-acc5-3971b2a14b09/bike/main/_0002_63-Francesco-Bagnaia_Bike45.png NaN https://photos.motogp.com/riders/2/5/25778ccd-c018-4b6e-8fa1-9325d2bc0f74/helmet/main/63-Francesco-Bagnaia-Helmet.jpg NaN https://photos.motogp.com/riders/2/5/2585fdb8-5fb5-43b6-a12d-ccb5ba31c0a4/number/63_Francesco_Bagnaia.png https://photos.motogp.com/riders/4/e/4e947398-047a-44c5-acc5-3971b2a14b09/portrait/63-Francesco-Bagnaia-Rider_DS_4948.jpg 892fff2f-7402-4fbd-99fb-5fd567d8a80c 38af1078-e2f1-4399-811c-1e98cf6f6150 Ducati 110.0 Ducati Lenovo Team 15.0 #f92515 #ffffff https://photos.motogp.com/teams/7/d/7da82702-139c-4a2c-8ee3-a1478cb43c37/ducatilenovo.png True
12 88c48f39-d7dc-4e15-8ac4-45142afc3e8c Francesco Monaco None NaN None 1970-07-11 52.0 True 2415 IT Italy NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 8ea7300c-263d-4216-9438-1ef3beb55c3c Francesco Mauriello None NaN Napoli 1993-11-28 28.0 True 7948 IT Italy NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
14 ff98e773-384c-447c-a993-81fb07b58c47 Francesco Villa None NaN None None NaN True 1574 IT Italy 1977.0 NaN None NaN f4c00279-2ae2-42fa-8bce-01c5eaedf392 250cc 5.0 True None False None NaN None NaN None NaN None https://www.motogp.com/en/api/rider/photo/grid/old/1574 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Of course, you can filter out that dataframe to get only what you need from it. Also, you can create a dictionary with name/surname, and send that to the API: https://api.motogp.com/riders-api/riders?name={name}&surname={surname}
Requests documentation: https://requests.readthedocs.io/en/latest/
Also, pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html