Home > Software design >  Search for class element underneath another element
Search for class element underneath another element

Time:03-21

I scrape daily lineups, and need to find out if a team does not have it's lineup posted. In this case, there is a class element called lineup__no. I'd like to look at each team and check if there lineup is posted, and if not, add that teams index to a list. For example, if there are 4 teams playing, and the first and third teams do not have a lineup posted, I want to return a list of [0,2]. I am guessing a list comprehension of some sort may help me get there, but struggling to come up with what I need. I tried a for loop for now to get each of the items under the main header. I've also tried adding each li item's text to a list and searching for "Unknown Lineup" but was unsuccessful.

from selenium import webdriver

from selenium.common.exceptions import NoSuchElementException

from bs4 import BeautifulSoup
import requests
import pandas as pd

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

games = soup.select('.lineup.is-mlb')
for game in games:
    initial_list = game.find_all('li')
    print(initial_list)

CodePudding user response:

Since I'm more familiar with Selenium I'll give you Selenium solution.
Please see my explanations inside the code given as comments.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 20)
driver.get("https://www.rotowire.com/baseball/daily-lineups.php")
#wait for at least 1 game element to be visible
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lineup.is-mlb")))
#add a short delay so that all the other games are loaded
time.sleep(0.5)
#get all the games blocks
games = driver.find_elements(By.CSS_SELECTOR,".lineup.is-mlb")
#iterate over the games elements with their indexes in a list comprehension
no_lineup = [j for idx, game in enumerate(games) for j in [idx*2, idx*2 1] if game.find_elements(By.XPATH, ".//li[@class='lineup__no']")] 


#print the collected results
print(no_lineup)
#quit the driver
driver.quit()

CodePudding user response:

Simply just look under the <li> tags with . By the way,

The Guardians lineup has not been posted yet.

threw me off there for a second...totally forgot about that!

from bs4 import BeautifulSoup
import requests

#Scraping lineups for updates
url = 'https://www.rotowire.com/baseball/daily-lineups.php'

##Requests rotowire HTML
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

lineupStatuses = soup.find_all('li', {'class':'lineup__status'})

for lineupStatus in lineupStatuses:
    if lineupStatus.parent.find('li', {'class':'lineup__no'}):
        print(lineupStatus.parent.find('li', {'class':'lineup__no'}).text)

Output:

The Orioles lineup has not been posted yet.
The Red Sox lineup has not been posted yet.
The Rays lineup has not been posted yet.
The Twins lineup has not been posted yet.
The Tigers lineup has not been posted yet.
The Yankees lineup has not been posted yet.
The Phillies lineup has not been posted yet.
The Braves lineup has not been posted yet.
The Nationals lineup has not been posted yet.
The Astros lineup has not been posted yet.
The Pirates lineup has not been posted yet.
The Blue Jays lineup has not been posted yet.
The Cardinals lineup has not been posted yet.
The Mets lineup has not been posted yet.
The Diamondbacks lineup has not been posted yet.
The Royals lineup has not been posted yet.
The Guardians lineup has not been posted yet.
The Athletics lineup has not been posted yet.
The Giants lineup has not been posted yet.
The Reds lineup has not been posted yet.
The Cubs lineup has not been posted yet.
The Dodgers lineup has not been posted yet.
The Padres lineup has not been posted yet.
The Brewers lineup has not been posted yet.
The Angels lineup has not been posted yet.
The Mariners lineup has not been posted yet.
The White Sox lineup has not been posted yet.
The Rockies lineup has not been posted yet.
  • Related