Home > Blockchain >  Selenium and bs4 can't retrieve element on web page
Selenium and bs4 can't retrieve element on web page

Time:10-02

I need to retrieve the country name in each rider's page. Sometimes this code works and sometimes it doesn't (soup.find() return None). Why?

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

names = ['Fabio Di Giannantonio', 'Francesco Bagnaia']
for name in names:
    driver = webdriver.Chrome("/usr/bin/chromedriver")

    driver.get(f"https://www.motogp.com/en/riders/profile/{name}")
    soup = BeautifulSoup(driver.page_source)
    print(soup.find("p", "card-text c-rider-country").get_text())
    time.sleep(30)
    driver.close()

CodePudding user response:

I'm not familiar with BeautifulSoup so I'll give a Selenium solution.
With Selenium your code is missing a wait - you need to wait for element to be fully rendered before extracting it's text.
The best practice to do that with Selenium is to use WebDriverWait, as following:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service=webdriver_service, options=options)
wait = WebDriverWait(driver, 10)

names = ['Fabio Di Giannantonio', 'Francesco Bagnaia']
for name in names:
    driver.get(f"https://www.motogp.com/en/riders/profile/{name}")
    title =wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.card-text.c-rider-country"))).text
    print(title)

The output is stable:

ITALY
ITALY

I run this code multiple times

CodePudding user response:

I suspect you are approaching this problem the wrong way. Selenium is a tool used for testing: it should be the last call for a web scraping task, if all else would fail.

The information in those pages is being hydrated from an API enddpoint, via javascript XHR calls. You can scrape directly that API, as long as you provide the correct information, that being name/surname. Here is an example where I'm looking for all drivers with names Fabio and Francesco:

import requests
import pandas as pd

drivers_df = pd.DataFrame()
s = requests.Session()

drivers = ['Fabio', 'Francesco']

for d in drivers:
    r = s.get(f'https://api.motogp.com/riders-api/riders?name={d}')
    df = pd.json_normalize(r.json())
    drivers_df = pd.concat([drivers_df, df], axis=0, ignore_index=True)
print(drivers_df) 

Result in terminal:

    id  name    surname nickname    current_career_step birth_city  birth_date  years_old   published   legacy_id   country.iso country.name    current_career_step.season  current_career_step.number  current_career_step.sponsored_team  current_career_step.team    current_career_step.category.id current_career_step.category.name   current_career_step.category.legacy_id  current_career_step.in_grid current_career_step.short_nickname  current_career_step.current current_career_step.pictures.profile.main   current_career_step.pictures.profile.secondary  current_career_step.pictures.bike.main  current_career_step.pictures.bike.secondary current_career_step.pictures.helmet.main    current_career_step.pictures.helmet.secondary   current_career_step.pictures.number current_career_step.pictures.portrait   current_career_step.team.id current_career_step.team.constructor.id current_career_step.team.constructor.name   current_career_step.team.constructor.legacy_id  current_career_step.team.name   current_career_step.team.legacy_id  current_career_step.team.color  current_career_step.team.text_color current_career_step.team.picture    current_career_step.team.published
0   1ea1f811-7505-43b0-8225-e1a325a2d1e1    Fabio   Meozzi  None    NaN None    None    NaN True    6639    IT  Italy   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1   2d7c92c8-6ec6-4760-978e-5a64023fa811    Fabio   Nucci   None    NaN Arenzzo None    NaN True    5559    IT  Italy   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   35a3128b-80f8-4349-aa3d-3411c155fa9a    Fabio   Biliotti    None    NaN None    None    NaN True    726 IT  Italy   1989.0  NaN None    NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370    500cc   -1.0    True    None    False   None    NaN None    NaN None    NaN None    https://www.motogp.com/en/api/rider/photo/grid/old/726  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   3b489143-ee09-4be2-a224-5a43a0a93e4f    Fabio   Spiranelli  None    NaN Lodi    1999-12-05  22.0    True    8825    IT  Italy   2016.0  3.0 CIP-Unicom Starker  NaN 1ab203aa-e292-4842-8bed-971911357af1    Moto3   1.0 True    None    False   None    NaN https://photos.motogp.com/2016/riders/moto3/bike/original/rider_8825_1458823989.jpg NaN https://photos.motogp.com/2016/riders/moto3/helmet/original/rider_8825_1458821857.jpg   NaN None    https://photos.motogp.com/2016/riders/moto3/grid/original/rider_8825_1458821693.jpg 5a88fee3-0f04-45f9-ba64-3fd6d2a62d73    5ecd8db7-d87b-4b3e-87b6-1f72ee457ede    KTM 298.0   CIP-Unicom Starker  110.0   None    None    https://photos.motogp.com/2020/teams/moto3/original/team_bike_110_1584101124.jpg    True
4   3de48564-be7d-4c34-8f85-4edeea23a313    Fabio   Carpani None    NaN Padenghe Garda  1975-08-23  47.0    True    2420    IT  Italy   1998.0  NaN None    NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370    500cc   -1.0    True    None    False   None    NaN None    NaN None    NaN None    https://www.motogp.com/en/api/rider/photo/grid/old/2420 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5   3f845fa3-693f-47ae-b6ce-d8804b2b4909    Fabio   Barchitta   None    NaN None    None    NaN True    717 IT  Italy   1988.0  NaN None    NaN 5a2a0bae-2060-475e-867d-3b6a34dbe370    500cc   -1.0    True    None    False   None    NaN None    NaN None    NaN None    https://www.motogp.com/en/api/rider/photo/grid/old/717  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6   525b1551-f10b-4cfd-9b43-59af6fca654b    Fabio   Di Giannantonio None    NaN Roma    1998-10-10  23.0    True    8539    IT  Italy   2022.0  49.0    Gresini Racing MotoGP™  NaN 737ab122-76e1-4081-bedb-334caaa18c70    MotoGP  3.0 True    FD49    False   https://photos.motogp.com/riders/a/0/a04438ea-4e12-47f6-bb08-ba4589ea3665/profile/main/49-Fabio-DiGiananntonioRider_DS_5200.png NaN https://photos.motogp.com/riders/a/1/a12c0f24-05ee-4983-9407-f05fbdf7c67c/bike/main/49_Fabio_Di_Giannantonio.png    NaN https://photos.motogp.com/riders/a/d/ad1917dc-aaf4-4038-b5e8-82c6b83ebb23/helmet/main/49-Fabio-Diggianantonio.jpg   NaN None    https://photos.motogp.com/riders/a/0/a04438ea-4e12-47f6-bb08-ba4589ea3665/portrait/49-Fabio-DiGiannantonio-Rider_DS_5192.jpg    11729e67-d2cb-41ad-b3a8-4a0ac5768a5f    38af1078-e2f1-4399-811c-1e98cf6f6150    Ducati  110.0   Gresini Racing MotoGP™  10.0    #a1b7e5 #323232 https://photos.motogp.com/teams/6/6/66af5d2c-8d52-4099-988c-981983046476/GresiniRacing_.png True
7   68c035ae-4d49-4f2c-a342-c0f0be21d964    Fabio   Bitocchi    None    NaN None    None    NaN True    9948    IT  Italy   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8   9042979e-cca9-42cb-a17e-56da52a5fb3b    Fabio   Frankenberger   None    NaN None    None    NaN True    7842    DE  Germany NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9   bf95d959-6a60-44f1-84b5-ded861e62578    Fabio   Quartararo  None    NaN Nice    1999-04-20  23.0    True    8520    FR  France  2022.0  20.0    Monster Energy Yamaha MotoGP™   NaN 737ab122-76e1-4081-bedb-334caaa18c70    MotoGP  3.0 True    FQ20    False   https://photos.motogp.com/riders/6/9/69b5c348-2840-4dc1-bf7b-457c0683222c/profile/main/20-Fabio-Quartararo.png  NaN https://photos.motogp.com/riders/d/0/d088b244-b3c1-4f06-af3c-e122613e2b8b/bike/main/_0003_20-Fabio-Quartararo-Bike-MotoGPDSC04216.png   NaN https://photos.motogp.com/riders/b/b/bbe8044a-2fbf-448c-bfad-6d65e596b06c/helmet/main/20-Fabio-Quartararo.jpg   NaN https://photos.motogp.com/riders/9/a/9ac4314f-fd8d-433f-876d-515ee1631c28/number/20_Fabio_Quartararo.png    https://photos.motogp.com/riders/6/9/69b5c348-2840-4dc1-bf7b-457c0683222c/portrait/20_Fabio_Quartararo.jpg  141b6f0f-7e53-4d27-9bdb-0ea8fba7e842    f2e835be-7fab-4782-a26b-de3d583d132c    Yamaha  3.0 Monster Energy Yamaha MotoGP™   19.0    #183dc7 #ffffff https://photos.motogp.com/teams/9/1/91699bb4-f33d-40de-b995-4f5e120ff74d/Yamaha.png True
10  2e893359-7f93-4b55-9c54-6016337c8e80    Francesco   Pellegrino  None    NaN None    1964-06-09  58.0    True    2567    VE  Venezuela (Bolivarian Republic of)  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11  66b78301-5826-4986-b11e-fa68a7bd77a7    Francesco   Bagnaia None    NaN Torino  1997-01-14  25.0    True    8273    IT  Italy   2022.0  63.0    Ducati Lenovo Team  NaN 737ab122-76e1-4081-bedb-334caaa18c70    MotoGP  3.0 True    FB63    False   https://photos.motogp.com/riders/e/a/eac63974-aeee-4f62-81a4-f9588a47009d/profile/main/63_Francesco_Bagnaia.png NaN https://photos.motogp.com/riders/4/e/4e947398-047a-44c5-acc5-3971b2a14b09/bike/main/_0002_63-Francesco-Bagnaia_Bike45.png   NaN https://photos.motogp.com/riders/2/5/25778ccd-c018-4b6e-8fa1-9325d2bc0f74/helmet/main/63-Francesco-Bagnaia-Helmet.jpg   NaN https://photos.motogp.com/riders/2/5/2585fdb8-5fb5-43b6-a12d-ccb5ba31c0a4/number/63_Francesco_Bagnaia.png   https://photos.motogp.com/riders/4/e/4e947398-047a-44c5-acc5-3971b2a14b09/portrait/63-Francesco-Bagnaia-Rider_DS_4948.jpg   892fff2f-7402-4fbd-99fb-5fd567d8a80c    38af1078-e2f1-4399-811c-1e98cf6f6150    Ducati  110.0   Ducati Lenovo Team  15.0    #f92515 #ffffff https://photos.motogp.com/teams/7/d/7da82702-139c-4a2c-8ee3-a1478cb43c37/ducatilenovo.png   True
12  88c48f39-d7dc-4e15-8ac4-45142afc3e8c    Francesco   Monaco  None    NaN None    1970-07-11  52.0    True    2415    IT  Italy   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13  8ea7300c-263d-4216-9438-1ef3beb55c3c    Francesco   Mauriello   None    NaN Napoli  1993-11-28  28.0    True    7948    IT  Italy   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
14  ff98e773-384c-447c-a993-81fb07b58c47    Francesco   Villa   None    NaN None    None    NaN True    1574    IT  Italy   1977.0  NaN None    NaN f4c00279-2ae2-42fa-8bce-01c5eaedf392    250cc   5.0 True    None    False   None    NaN None    NaN None    NaN None    https://www.motogp.com/en/api/rider/photo/grid/old/1574 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Of course, you can filter out that dataframe to get only what you need from it. Also, you can create a dictionary with name/surname, and send that to the API: https://api.motogp.com/riders-api/riders?name={name}&surname={surname}

Requests documentation: https://requests.readthedocs.io/en/latest/

Also, pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

  • Related