Home > Software engineering >  how to skip element (continue looping) or filled it with certain value if the element doesn't e
how to skip element (continue looping) or filled it with certain value if the element doesn't e

Time:08-19

I'm sorry for my terrible English. I'm kinda new to Python. I would like to know, how to skip for loop process if Web element does not exists or fill it with other value? I've been trying to scrape youtube channel to get the title, views, and when video posted. My code looks like this:

import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome import service
from selenium.webdriver.common.keys import Keys
import time
import wget
import os
import pandas as pd
import matplotlib.pyplot as plt


urls = [
    'https://www.youtube.com/c/LofiGirl/videos', 
    'https://www.youtube.com/c/Miawaug/videos'
]

for url in urls:
    PATH = 'C:\webdrivers\chromedriver.exe.'
    driver = webdriver.Chrome(PATH)
    driver.get(url)
    #driver.maximize_window()
    driver.implicitly_wait(10)

    for i in range(10):
        driver.find_element(By.TAG_NAME, "Body").send_keys(Keys.END)
        driver.implicitly_wait(20)

    time.sleep(5)

    judul_video = []
    viewers = []
    tanggal_posting = []

    titles = driver.find_elements(By.XPATH, "//a[@id='video-title']")
    views = driver.find_elements(By.XPATH, "//div[@id='metadata-line']/span[1]")
    DatePosted = driver.find_elements(By.XPATH, "//div[@id='metadata-line']/span[2]")

    for title in titles:
        judul_video.append(title.text)
        driver.implicitly_wait(5)
    for view in views:
        viewers.append(view.text)
        driver.implicitly_wait(5)
    for posted in DatePosted:
        tanggal_posting.append(posted.text)
        driver.implicitly_wait(5)
    
    vid_item = {
        "video_title" : judul_video,
        "views" : viewers,
        "date_posted" : tanggal_posting
    }

    df = pd.DataFrame(vid_item, columns=["video_title", "views", "date_posted"])
    #df_new = df.transpose()
    print(df)
    
    filename = url.split('/')[-2]
    df.to_csv(rf"C:\Users\.......\YouTube_{filename}.csv", sep=",")

    driver.quit()

That code works good, but at this code:

for posted in DatePosted:
    tanggal_posting.append(posted.text)
    driver.implicitly_wait(5)

when, some channel doing a live streaming, such as lofi Girl, I've got an error said "All arrays must be of the same length". Apparently, I had failed to create if else condition to fill streaming channel with other value such as Tanggal_posting.append("Live Stream") or else, or just skip entirely extraction data start from the title. This code below are trying to skip or filled with other value, but failed:

for posted in DatePosted:
    if len(posted.text) > 0:
        tanggal_posting.append(posted.text)
        driver.implicitly_wait(5)
    else:
        tanggal_posting.append("Live")
        driver.implicitly_wait(5)

How can I skip the iteration just for a single video that shown doing Live Stream? or how can I fill the value with other value such as "Live Stream" by using if else condition as I mention before? Thank you so much in Advance.

CodePudding user response:

Personally, I'd check first if posted is viable for a .text attribute call.

for posted in DatePosted:
    _posted = posted.text.strip() if posted else None
    tanggal_posting.append(_posted if _posted else "Live")
    driver.implicitly_wait(5)

Alternatively:

for posted in DatePosted:
    _posted = posted.text.strip() if posted else None
    if not _posted:
        continue
        
    tanggal_posting.append(_posted)
    driver.implicitly_wait(5)

The overall code should differ depending on your objective. Though I suppose _posted will be helpful in any of them.

CodePudding user response:

Instead of collecting 3 separate lists for each data item I'd suggest to get list of videos and then extract each item and handle then:

videos = driver.find_elements(By.XPATH, "//div[@id='items']/ytd-grid-video-renderer")
for video in videos:
    if not video.find_elements(By.XPATH, ".//yt-icon"):  # Check if no Streaming icon
        title = video.find_element(By.XPATH, ".//a[@id='video-title']")
        view = video.find_element(By.XPATH, ".//div[@id='metadata-line']/span[1]")
        DatePosted = video.find_element(By.XPATH, ".//div[@id='metadata-line']/span[2]")

Note that you need to call driver.implicitly_wait(<SECONDS>) only ONCE at the beginning of script!

  • Related