Home > Back-end >  Why can't I scrape all data?
Why can't I scrape all data?

Time:03-04

with this flow I'm trying to scrape all data from a specific website. The main issue is related to the output of the flow because I'm not receiving the list of all home teams but only the name of home team from the first match. What can I do to receive all data fomr the website?

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
html = driver.page_source 
soup = BeautifulSoup(html,'lxml')
games = soup.find_all('div', class_ = 'event__match event__match--live event__match--last 
event__match--twoLine')
for game in games:
home = soup.find('div', class_ = 'event__participant event__participant--home').text
away = soup.find('div', class_ = 'event__participant event__participant--away').text
time = soup.find('div', class_ = 'event__time').text
print(home)

CodePudding user response:

You are looping over games but not using it as object for your in-loop finds.

home = game.find('div', class_ = 'event__participant event__participant--home').text

CodePudding user response:

First of all when using selenium you don't need beautiful soup, because you can use find_elenet_by to find a tag and find_elements_by (elements with an s. Plural), to get a list of all tags with with similar entities.

Your code would be:

from selenium import webdriver

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')

games = driver.find_element_by_css_selector('div[class = "event__match event__match--live event__match--last event__match--twoLine"]')

for game in games:
    home = game.find_element_by_css_selector('div[class = "event__participant event__participant--home"]').text
    away = game.find_element_by_css_selector('div[class = "event__participant event__participant--away"]').text
    time = game.find_element_by_css_selector('div[class = "event__time"]').text
    
    print(home)
  • Related