Context:
I am trying to scrape video titles, views and when was it uploaded info from a YouTube channel. It is scraping the same element.
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.youtube.com/c/JohnWatsonRooney/videos?view=0&sort=p&flow=grid'
driver = webdriver.Chrome()
driver.get(url)
videos = driver.find_elements(by=By.CLASS_NAME, value='style-scope ytd-grid-video-renderer')
for video in videos:
title = driver.find_element(by=By.XPATH, value='.//*[@id="video-title"]').text
views = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[1]').text
when = driver.find_element(by=By.XPATH, value='.//*[@id="metadata-line"]/span[2]').text
print(f"""Video Title: {title}\nViews: {views}\nUploaded: {when}\n -----------""")
Output
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago
Video Title: Scrapy for Beginners - A Complete How To Example Web Scraping Project
Views: 104K views
Uploaded: 1 year ago..
CodePudding user response:
To print the title, views and when text from the website you need to WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:
Code Block:
driver.get("https://www.youtube.com/c/JohnWatsonRooney/videos") titles = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a#video-title")))] views = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:first-child")))] when = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div#metadata-line > span:nth-child(2)")))] for i, j, k in zip(titles, views, when): print(f"{i} had {j} since posted {k}")
Console Output:
The Python Package I Wish I'd Learned Earlier had 4.5K views since posted 4 days ago Rotate User Agents in Scrapy using custom Middleware had 1.2K views since posted 11 days ago GO for Beginners - Web Scraping with Golang Tutorial had 2K views since posted 13 days ago How I Scraped This HTML Table to a Python Dictionary had 3.6K views since posted 3 weeks ago Python TYPE HINTS Explained with Examples had 2.3K views since posted 3 weeks ago Use THIS Algorithm To Find KEYWORDS in Text - A Short Python Project had 4.2K views since posted 1 month ago SQLModel is the Pydantic inspired Python ORM we’ve been waiting for had 3.4K views since posted 1 month ago How to use Enumerate in Python to have a Counter in your loops had 4.6K views since posted 1 month ago How to HIDE Your API Keys in Python Projects had 6.6K views since posted 1 month ago How to Make 2500 HTTP Requests in 2 Seconds with Async & Await had 4.8K views since posted 2 months ago Are You Still Using Excel? AUTOMATE it with PYTHON had 10K views since posted 2 months ago How To Parse Data from HTML Tables Using Requests-HTML had 3.5K views since posted 2 months ago THIS is the most common ERROR when learning Web Scraping had 3.6K views since posted 2 months ago Learn Web Scraping With Python: Full Project - HTML, Save to CSV, Pagination had 10K views since posted 3 months ago Turn Websites into Real Time API's with ScrapyRT had 4.4K views since posted 3 months ago HTTPX is the ASYNC Requests I was Looking For had 5.8K views since posted 3 months ago Research Amazon Products by Extracting Review Data had 4K views since posted 4 months ago Web Scraping 101 - My (in)complete guide, methods, tools, how to had 5.4K views since posted 4 months ago How to Scrape JavaScript Websites with Scrapy and Playwright had 9.8K views since posted 5 months ago Web Scraping with Node js? Python Expert Opinion and demo had 3.3K views since posted 5 months ago Automate Buying online using Playwright’s Codegen feature had 4.6K views since posted 5 months ago Login and Scrape Data with Playwright and Python had 16K views since posted 5 months ago Parse HTML with BeautifulSoup AND Scrapy had 2.6K views since posted 5 months ago HIDING Data with JavaScript? Web Scraping Obfuscation had 3.9K views since posted 5 months ago Following LINKS Automatically with Scrapy CrawlSpider had 6.2K views since posted 6 months ago Web Scraping Weather Data with Python had 13K views since posted 6 months ago How I Try My CODE and Test My SELECTORS in SCRAPY had 1.8K views since posted 6 months ago How To Handle Errors & Exceptions with Requests and Python had 2.8K views since posted 6 months ago A Short and SIMPLE HTML Web Scraper in 6 lines of CODE had 1.9K views since posted 7 months ago Failed Requests? Try this RETRY Decorator for your Web Scraper had 3.5K views since posted 7 months ago
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC