Storing the id attribute of all elements using CssSelector in a list using Selenium-CodePudding

The Python selenium code I have here takes us to a page with some datasets I'm interested in scraping. Here's my code.

import time, os
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions()
preferences= {"download.default_directory": os.getcwd(), "directory_upgrade": True}
options.add_experimental_option("prefs", preferences)
#options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])

url = "http://www.ssp.sp.gov.br/transparenciassp/"

# Path of my WebDriver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

wait = WebDriverWait(driver, 10)


# to maximize the browser window
driver.maximize_window()

#get method to launch the URL
driver.get(url)

wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#cphBody_btnHomicicio"))).click()

for x in range(18, 19):
    year = '#cphBody_lkAno' str(x)
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, year))).click()
    
    for x in range(1, 2):
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#cphBody_lkMes' str(x)))).click()

So far this works, but my issue comes with the last loop of this syntax, even though it does exactly what I want. What I want to do, is select all of the elements on the page with a CSS selector which contain the text "cphBody_lkMes" and put them in a list like

['#cphBody_lkMes1','#cphBody_lkMes2', ...]

so that I can loop through them accordingly. The current code appears messy.

When I try

for x in range(18, 19):
    year = '#cphBody_lkAno' str(x)
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, year))).click()
    
    meses = driver.find_elements_by_css_selector("//*[contains(text(),'cphBody_lkMes')]")
    
    meses
    
    for x in range(1, 2):
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#cphBody_lkMes' str(x)))).click()

the list comes up as empty, and an error says

An invalid or illegal selector was specified.

How might I solve this?

CodePudding user response：

To select all of the elements on the page with a CSS selector whose id attribute starts with cphBody_lkMes you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR

print([my_elem.get_attribute("id") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.nav.nav-pills.mesNav li > a[id^='cphBody_lkMes']")))])

Using XPATH:

print([my_elem.get_attribute("id") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='nav nav-pills mesNav']//li/a[starts-with(@id, 'cphBody_lkMes')]")))])

Update

To iterate the list:

months = [my_elem.get_attribute("id") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.nav.nav-pills.mesNav li > a[id^='cphBody_lkMes']")))]
for month in months:
    print(month)

CodePudding user response：

I see several issues here:

You are defining x index both in outer and the inner for loops
It should be meses = driver.find_elements_by_xpath there instead of meses = driver.find_elements_by_css_selector
And finally cphBody_lkMes are the substrings of elements id values, not the text values.
So, this gives me non-empty meses list:

for x in range(18, 19):
    year = '#cphBody_lkAno'   str(x)
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, year))).click()
    meses = driver.find_elements_by_xpath("//*[contains(@id,'cphBody_lkMes')]")

    for y in range(1, 2):
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#cphBody_lkMes'   str(y)))).click()