Home > Blockchain >  Python Selenium driver keeps old data after click
Python Selenium driver keeps old data after click

Time:02-27

I use Python Selenium to grab Youtube Video URL. I load the home page first and I click on a random result. Form that second page, I want to get the suggested video on the right. But when I do that, the driver just ADDS the suggested video to the list of video found on the home page. I don't know why... so I need to reset or clear something inbetween find_elements

driver.get('https://www.youtube.com/')
time.sleep(8)
items = driver.find_elements(By.XPATH, "//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]")

for i in items:
    url = i.get_attribute("href")
    print(str(url))

rand = random.choice(items)
rand.click()
time.sleep(10)

# GET SUGGESTED VIDEO ON THE RIGHT
yt_right_pane_items = driver.find_elements(By.XPATH, "//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]")

for i in yt_right_pane_items:
    url = i.get_attribute("href")
    print(str(url))

The OUTPUT of the HOMEPAGE:

https://www.youtube.com/watch?v=0YuC4ZJJI5c
https://www.youtube.com/watch?v=FyUIEU1qW1w&t=13147s
https://www.youtube.com/watch?v=H9-ekUCFCr0
https://www.youtube.com/watch?v=BoVAOpSiD_A
https://www.youtube.com/watch?v=lJqDZKAxOOY
https://www.youtube.com/watch?v=nJL1k37T6r8
https://www.youtube.com/watch?v=o1dhGnZIxfI
https://www.youtube.com/watch?v=y57jYUogWFs
https://www.youtube.com/watch?v=4V0e9IpzSfs

The Second output = Videos of the first find_elements videos of second find_elements

https://www.youtube.com/watch?v=0YuC4ZJJI5c
https://www.youtube.com/watch?v=FyUIEU1qW1w&t=13147s
https://www.youtube.com/watch?v=H9-ekUCFCr0
https://www.youtube.com/watch?v=BoVAOpSiD_A
https://www.youtube.com/watch?v=lJqDZKAxOOY
https://www.youtube.com/watch?v=nJL1k37T6r8
https://www.youtube.com/watch?v=o1dhGnZIxfI
https://www.youtube.com/watch?v=y57jYUogWFs
https://www.youtube.com/watch?v=4V0e9IpzSfs
https://www.youtube.com/watch?v=jHa20EBYPU8
https://www.youtube.com/watch?v=ImnTNcqtvlY
https://www.youtube.com/watch?v=ppiIs2YoFqo
https://www.youtube.com/watch?v=P3TFt5oqDJU
https://www.youtube.com/watch?v=BisnRXb_sk0
https://www.youtube.com/watch?v=l5Pjhl1vgUw
https://www.youtube.com/watch?v=nvsZKNYwHt0
https://www.youtube.com/watch?v=L6VBHflOeuY
https://www.youtube.com/watch?v=1MPRbX7ACh8

On the second find_elements, I only want to get the NEW video form the page that was clicked on.

CodePudding user response:

Problem is not Selenium nor list but YouTube - it keeps these links but hidden.

Your xpath searchs all links - even hidden - but it should search only in visible part

//div[@id='columns']

Full xpath

//div[@id='columns']//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]

And if you want only SUGGESTED VIDEO ON THE RIGHT then search in

//div[@id='related'] 

Full xpath

//div[@id='related']//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]


Other method is to use set() which removes duplicated elements

new = list( set(second_list) - set(first_list) )

duplicated = list( set(second_list) & set(first_list) )

It can be useful because you can get duplicated in suggested from all pages.

  • Related