Home > front end >  Python | Selenium Issue with scrolling down and find by class name
Python | Selenium Issue with scrolling down and find by class name

Time:12-30

For one study research I would like to scrape some links from webpages which located out of viewport (to see this links you need to scroll down the page).

I have write the script but I get empty list:

from selenium import webdriver
import time

browser = webdriver.Firefox()
browser.get('https://www.twitch.tv/lirik')

time.sleep(3)
browser.execute_script("window.scrollBy(0,document.body.scrollHeight)")

time.sleep(3)

panel_blocks = browser.find_elements(by='class name', value='Layout-sc-nxg1ff-0 itdjvg default-panel')
browser.close()
print(panel_blocks)
print(type(panel_blocks))

I just get empty list after page was loaded. Here is output from the script above:

/usr/local/bin/python /Users/greg.fetisov/PycharmProjects/baltazar_platform/Twitch_parser.py
[]
<class 'list'>

Process finished with exit code 0

p.s. when webdriver opens the page, I see there is no scroll down action. It just open a page and then close it after time.sleep cooldown.

How I can change the script to get the links properly?

Any help or advice would be appreciated!

CodePudding user response:

To print the values of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.twitch.tv/lirik")
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.Layout-sc-nxg1ff-0.itdjvg.default-panel > a")))])
    
  • Console Output:

    ['https://www.amazon.com/dp/B09FVR22R2', 'http://bs.serving-sys.com/Serving/adServer.bs?cn=trd&pli=1077437714&gdpr=${GDPR}&gdpr_consent=${GDPR_CONSENT_68}&adid=1085757156&ord=[timestamp]', 'https://store.epicgames.com/lirik/rumbleverse', 'https://bitly/3GP0cM0', 'https://lirik.com/', 'https://streamlabs.com/lirik', 'https://twitch.amazon.com/tp', 'https://www.twitch.tv/subs/lirik', 'https://www.youtube.com/lirik?sub_confirmation=1', 'http://www.twitter.com/lirik', 'http://www.instagram.com/lirik', 'http://gfuel.ly/lirik', 'http://www.cyberpowerpc.com/', 'https://www.cyberpowerpc.com/page/Intel/LIRIK/', 'https://discord.gg/lirik', 'http://www.amazon.com/?_encoding=UTF8&camp=1789&creative=390957&linkCode=ur2&tag=l0e6d-20&linkId=YNM2SXSSG3KWGYZ7']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

  1. You are using a wrong locator.
  2. You should use expected conditions explicit waits instead of hardcoded pauses.
  3. find_elements method returns a list of web elements while you want to the link inside the element(s).

This should work better:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

browser = webdriver.Firefox()
browser.get('https://www.twitch.tv/lirik')
wait = WebDriverWait(browser, 20)

wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='channel-panels-container']//a")))
time.sleep(0.5)

link_blocks = browser.find_element_by_xpath("//div[@class='channel-panels-container']//a")
for link_block in link_blocks:
    link = link_block.get_attribute("href")
    print(link)

browser.close()
  • Related