I use Python Selenium to grab Youtube Video URL. I load the home page first and I click on a random result. Form that second page, I want to get the suggested video on the right. But when I do that, the driver just ADDS the suggested video to the list of video found on the home page. I don't know why... so I need to reset or clear something inbetween find_elements
driver.get('https://www.youtube.com/')
time.sleep(8)
items = driver.find_elements(By.XPATH, "//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]")
for i in items:
url = i.get_attribute("href")
print(str(url))
rand = random.choice(items)
rand.click()
time.sleep(10)
# GET SUGGESTED VIDEO ON THE RIGHT
yt_right_pane_items = driver.find_elements(By.XPATH, "//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]")
for i in yt_right_pane_items:
url = i.get_attribute("href")
print(str(url))
The OUTPUT of the HOMEPAGE:
https://www.youtube.com/watch?v=0YuC4ZJJI5c
https://www.youtube.com/watch?v=FyUIEU1qW1w&t=13147s
https://www.youtube.com/watch?v=H9-ekUCFCr0
https://www.youtube.com/watch?v=BoVAOpSiD_A
https://www.youtube.com/watch?v=lJqDZKAxOOY
https://www.youtube.com/watch?v=nJL1k37T6r8
https://www.youtube.com/watch?v=o1dhGnZIxfI
https://www.youtube.com/watch?v=y57jYUogWFs
https://www.youtube.com/watch?v=4V0e9IpzSfs
The Second output = Videos of the first find_elements videos of second find_elements
https://www.youtube.com/watch?v=0YuC4ZJJI5c
https://www.youtube.com/watch?v=FyUIEU1qW1w&t=13147s
https://www.youtube.com/watch?v=H9-ekUCFCr0
https://www.youtube.com/watch?v=BoVAOpSiD_A
https://www.youtube.com/watch?v=lJqDZKAxOOY
https://www.youtube.com/watch?v=nJL1k37T6r8
https://www.youtube.com/watch?v=o1dhGnZIxfI
https://www.youtube.com/watch?v=y57jYUogWFs
https://www.youtube.com/watch?v=4V0e9IpzSfs
https://www.youtube.com/watch?v=jHa20EBYPU8
https://www.youtube.com/watch?v=ImnTNcqtvlY
https://www.youtube.com/watch?v=ppiIs2YoFqo
https://www.youtube.com/watch?v=P3TFt5oqDJU
https://www.youtube.com/watch?v=BisnRXb_sk0
https://www.youtube.com/watch?v=l5Pjhl1vgUw
https://www.youtube.com/watch?v=nvsZKNYwHt0
https://www.youtube.com/watch?v=L6VBHflOeuY
https://www.youtube.com/watch?v=1MPRbX7ACh8
On the second find_elements, I only want to get the NEW video form the page that was clicked on.
CodePudding user response:
Problem is not Selenium
nor list
but YouTube
- it keeps these links but hidden.
Your xpath
searchs all links - even hidden - but it should search only in visible part
//div[@id='columns']
Full xpath
//div[@id='columns']//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]
And if you want only SUGGESTED VIDEO ON THE RIGHT
then search in
//div[@id='related']
Full xpath
//div[@id='related']//a[@id='thumbnail'][@class='yt-simple-endpoint inline-block style-scope ytd-thumbnail'][contains(@href, 'watch?v=')]
Other method is to use set()
which removes duplicated elements
new = list( set(second_list) - set(first_list) )
duplicated = list( set(second_list) & set(first_list) )
It can be useful because you can get duplicated in suggested from all pages.