Its my understanding that when the find_elements
-method is called on a selenium.webdriver
-instance, it returns a reference of all the elements in the DOM that matches with the provided locator.
When I run the following code I only get <30 elements whereas there are many more on the page in question.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
DRIVER_PATH = '/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
wait = WebDriverWait(driver, 30)
URL = 'https://labs.actionnetwork.com/markets'
driver.get(URL)
PGA_button = wait.until(EC.presence_of_element_located((By.XPATH, ".//button[@class='btn btn-light' and text()='PGA']")))
driver.execute_script("arguments[0].click();", PGA_button)
logos = driver.find_elements(By.XPATH, ".//*[@class='odds-logo ml-1']")
The desired elements:
How do I do get all the logos
? When I debug and scroll the page down I get a list of different elements. So, I assume it has something to do with the viewport.
Is there a way to get all the elements on the page (even if they are outside the viewport)?
CodePudding user response:
A good strategy is to find all rows visible in the page and then scroll down to the last one so that new rows are loaded in the HTML. In the meanwhile you add the images urls to a set
. I choose set
instead of list
so that duplicate urls are not added, but if you want all the urls then replace the first command with img = []
and img.add
with img.append
.
img = set()
number_of_loops = 4
for i in range(number_of_loops):
print('loop',i 1)
rows = WebDriverWait(driver, 9).until(EC.visibility_of_all_elements_located((By.XPATH, '//div[@]/div[@role="row"]')))
row_idx = []
for row in rows:
img.add(row.find_element(By.XPATH, './/descendant::div[@]/img').get_attribute('src'))
row_idx.append(int(row.get_attribute('row-index')))
idx_of_max = row_idx.index(max(row_idx))
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', rows[idx_of_max])
time.sleep(2)
img
Output
{'https://assets.actionnetwork.com/[email protected]',
'https://assets.actionnetwork.com/[email protected]'}