Home > Mobile >  Python Selenium: Looping over the same element while scraping
Python Selenium: Looping over the same element while scraping

Time:05-03

Context:

I am trying to scrape video titles, views and when was it uploaded info from a YouTube channel. It is scraping the same element.

Code trials:

from selenium import webdriver
from selenium.webdriver.common.by import By

url = 'https://www.youtube.com/c/JohnWatsonRooney/videos?view=0&sort=p&flow=grid'
driver = webdriver.Chrome()
driver.get(url)

videos = driver.find_elements(by=By.CLASS_NAME, value='style-scope ytd-grid-video-renderer')

for video in videos:
  title = driver.find_element(by=By.XPATH, value='.//*[@id="video-title"]').text
  views = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[1]').text
  when = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[2]').text
  print(f"""Video Title: {title}\nViews: {views}\nUploaded: {when}\n -----------""")

Output

Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago

Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago

Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago..

CodePudding user response:

To print the title, views and when text from the website you need to WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:

  • Code Block:

    driver.get("https://www.youtube.com/c/JohnWatsonRooney/videos")
    titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a#video-title")))]
    views = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:first-child")))]
    when = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:nth-child(2)")))]
    for i, j, k in zip(titles, views, when):
    print(f"{i} had {j} since posted {k}")
    
  • Console Output:

    The Python Package I Wish I'd Learned Earlier had 4.5K views since posted 4 days ago
    Rotate User Agents in Scrapy using custom Middleware had 1.2K views since posted 11 days ago
    GO for Beginners - Web Scraping with Golang Tutorial had 2K views since posted 13 days ago
    How I Scraped This HTML Table to a Python Dictionary had 3.6K views since posted 3 weeks ago
    Python TYPE HINTS Explained with Examples had 2.3K views since posted 3 weeks ago
    Use THIS Algorithm To Find KEYWORDS in Text - A Short Python Project had 4.2K views since posted 1 month ago
    SQLModel is the Pydantic inspired Python ORM we’ve been waiting for had 3.4K views since posted 1 month ago
    How to use Enumerate in Python to have a Counter in your loops had 4.6K views since posted 1 month ago
    How to HIDE Your API Keys in Python Projects had 6.6K views since posted 1 month ago
    How to Make 2500 HTTP Requests in 2 Seconds with Async & Await had 4.8K views since posted 2 months ago
    Are You Still Using Excel? AUTOMATE it with PYTHON had 10K views since posted 2 months ago
    How To Parse Data from HTML Tables Using Requests-HTML had 3.5K views since posted 2 months ago
    THIS is the most common ERROR when learning Web Scraping had 3.6K views since posted 2 months ago
    Learn Web Scraping With Python: Full Project - HTML, Save to CSV, Pagination had 10K views since posted 3 months ago
    Turn Websites into Real Time API's with ScrapyRT had 4.4K views since posted 3 months ago
    HTTPX is the ASYNC Requests I was Looking For had 5.8K views since posted 3 months ago
    Research Amazon Products by Extracting Review Data had 4K views since posted 4 months ago
    Web Scraping 101 - My (in)complete guide, methods, tools, how to had 5.4K views since posted 4 months ago
    How to Scrape JavaScript Websites with Scrapy and Playwright had 9.8K views since posted 5 months ago
    Web Scraping with Node js? Python Expert Opinion and demo had 3.3K views since posted 5 months ago
    Automate Buying online using Playwright’s Codegen feature had 4.6K views since posted 5 months ago
    Login and Scrape Data with Playwright and Python had 16K views since posted 5 months ago
    Parse HTML with BeautifulSoup AND Scrapy had 2.6K views since posted 5 months ago
    HIDING Data with JavaScript? Web Scraping Obfuscation had 3.9K views since posted 5 months ago
    Following LINKS Automatically with Scrapy CrawlSpider had 6.2K views since posted 6 months ago
    Web Scraping Weather Data with Python had 13K views since posted 6 months ago
    How I Try My CODE and Test My SELECTORS in SCRAPY had 1.8K views since posted 6 months ago
    How To Handle Errors & Exceptions with Requests and Python had 2.8K views since posted 6 months ago
    A Short and SIMPLE HTML Web Scraper in 6 lines of CODE had 1.9K views since posted 7 months ago
    Failed Requests? Try this RETRY Decorator for your Web Scraper had 3.5K views since posted 7 months ago
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Related