Home > database >  Can't get href from Selenium webdriver scraping youtube
Can't get href from Selenium webdriver scraping youtube

Time:11-29

I am trying to scrape youtube videos from a channel by doing the following code below however, it seems that my element_titles don't have a href attribute. This worked about a year ago and I am unsure why it doesn't work now? Did youtube change the way we can get href?

#Scrape for videos
# WARNING: Takes very long


HOME = "https://www.youtube.com/user/theneedledrop/videos"
driver = webdriver.Chrome("C:\webdriver\chromedriver.exe")
driver.get(HOME)

scroll()
element_titles = driver.find_elements(By.ID,"video-title")

The following attribtues are what is found in the WebDriver objects

> element_titles[0].get_property('attributes')[0]

{'ATTRIBUTE_NODE': 2,
 'CDATA_SECTION_NODE': 4,
 'COMMENT_NODE': 8,
 'DOCUMENT_FRAGMENT_NODE': 11,
 'DOCUMENT_NODE': 9,
 'DOCUMENT_POSITION_CONTAINED_BY': 16,
 'DOCUMENT_POSITION_CONTAINS': 8,
 'DOCUMENT_POSITION_DISCONNECTED': 1,
 'DOCUMENT_POSITION_FOLLOWING': 4,
 'DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC': 32,
 'DOCUMENT_POSITION_PRECEDING': 2,
 'DOCUMENT_TYPE_NODE': 10,
 'ELEMENT_NODE': 1,
 'ENTITY_NODE': 6,
 'ENTITY_REFERENCE_NODE': 5,
 'NOTATION_NODE': 12,
 'PROCESSING_INSTRUCTION_NODE': 7,
 'TEXT_NODE': 3,
 '__shady_addEventListener': {},
 '__shady_appendChild': {},
 '__shady_childNodes': [],
 '__shady_cloneNode': {},
 '__shady_contains': {},
 '__shady_dispatchEvent': {},
 '__shady_firstChild': None,
 '__shady_getRootNode': {},
 '__shady_insertBefore': {},
 '__shady_isConnected': False,
 '__shady_lastChild': None,
 '__shady_native_addEventListener': {},
 '__shady_native_appendChild': {},
 '__shady_native_childNodes': [],
 '__shady_native_cloneNode': {},
 '__shady_native_contains': {},
 '__shady_native_dispatchEvent': {},
 '__shady_native_firstChild': None,
 '__shady_native_insertBefore': {},
 '__shady_native_lastChild': None,
 '__shady_native_nextSibling': None,
 '__shady_native_parentElement': None,
 '__shady_native_parentNode': None,
 '__shady_native_previousSibling': None,
 '__shady_native_removeChild': {},
 '__shady_native_removeEventListener': {},
 '__shady_native_replaceChild': {},
 '__shady_native_textContent': 'video-title',
 '__shady_nextSibling': None,
 '__shady_parentElement': None,
 '__shady_parentNode': None,
 '__shady_previousSibling': None,
 '__shady_removeChild': {},
 '__shady_removeEventListener': {},
 '__shady_replaceChild': {},
 '__shady_textContent': 'video-title',
 'addEventListener': {},
 'appendChild': {},
 'baseURI': 'https://www.youtube.com/user/theneedledrop/videos',
 'childNodes': [],
 'cloneNode': {},
 'compareDocumentPosition': {},
 'contains': {},
 'dispatchEvent': {},
 'firstChild': None,
 'getRootNode': {},
 'hasChildNodes': {},
 'insertBefore': {},
 'isConnected': False,
 'isDefaultNamespace': {},
 'isEqualNode': {},
 'isSameNode': {},
 'lastChild': None,
 'localName': 'id',
 'lookupNamespaceURI': {},
 'lookupPrefix': {},
 'name': 'id',
 'namespaceURI': None,
 'nextSibling': None,
 'nodeName': 'id',
 'nodeType': 2,
 'nodeValue': 'video-title',
 'normalize': {},
 'ownerDocument': <selenium.webdriver.remote.webelement.WebElement (session="906f0b2a91a96de78811a8b48c702ce9", element="4105d26d-55b3-49a1-b657-10bbbbf43c84")>,
 'ownerElement': <selenium.webdriver.remote.webelement.WebElement (session="906f0b2a91a96de78811a8b48c702ce9", element="c0d38452-435c-489a-8cb8-858adc4828b9")>,
 'parentElement': None,
 'parentNode': None,
 'prefix': None,
 'previousSibling': None,
 'removeChild': {},
 'removeEventListener': {},
 'replaceChild': {},
 'specified': True,
 'textContent': 'video-title',
 'value': 'video-title'}

I have tried exploring the web pages on youtube videos for the href however I am unable to find them

CodePudding user response:

Try video-title-link.

Exactly which element contains the /watch link depends slightly on the context, in the current state of YouTube. On the homepage and in a channel's "videos" tab, the URL of a given video can be found in its anchor element with id video-title-link.

On the "home" tab of a given channel, the relevant links still have id video-title.

CodePudding user response:

The below full working code will pull the required data here all the video links smoothly.

Example:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
import pandas as pd
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
#All are optional
#options.add_experimental_option("detach", True)
options.add_argument("--disable-extensions")
options.add_argument("--disable-notifications")
options.add_argument("--disable-Advertisement")
options.add_argument("--disable-popup-blocking")
options.add_argument("start-maximized")

s=Service('./chromedriver')
driver= webdriver.Chrome(service=s,options=options)

driver.get('https://www.youtube.com/user/theneedledrop/videos')
time.sleep(3)

item = []
SCROLL_PAUSE_TIME = 1
last_height = driver.execute_script("return document.documentElement.scrollHeight")

item_count = 100

while item_count > len(item):
    driver.execute_script("window.scrollTo(0,document.documentElement.scrollHeight);")
    time.sleep(SCROLL_PAUSE_TIME)
    new_height = driver.execute_script("return document.documentElement.scrollHeight")

    if new_height == last_height:
        break
    last_height = new_height
    

data = []
try:
    for e in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div#details'))):
        vurl = e.find_element(By.CSS_SELECTOR,'a#video-title-link').get_attribute('href')
        data.append({
            'video_url':vurl,
            
            })
except:
    pass
    
item = data
#print(item)
#print(len(item))
df = pd.DataFrame(item).drop_duplicates()
print(df.to_markdown())

Output:

| video_url                                   |
|----:|:--------------------------------------------|
|   0 | https://www.youtube.com/watch?v=UZcSkasvj5c |
|   1 | https://www.youtube.com/watch?v=9c8AXKAnp_E |
|   2 | https://www.youtube.com/watch?v=KaLUHF7nQic |
|   3 | https://www.youtube.com/watch?v=rxb2L0Bgp3U |
|   4 | https://www.youtube.com/watch?v=z3L1wXvMN0Q |
|   5 | https://www.youtube.com/watch?v=q7vqR74WVYc |
|   6 | https://www.youtube.com/watch?v=Kb31OTOYYG8 |
|   7 | https://www.youtube.com/watch?v=F-CaQbxwMZ0 |
|   8 | https://www.youtube.com/watch?v=AWDWTyC0jls |
|   9 | https://www.youtube.com/watch?v=LXWbnTgxeT4 |
|  10 | https://www.youtube.com/watch?v=5KlHjDnefYQ |
|  11 | https://www.youtube.com/watch?v=yfq8rdBcAMg |
|  12 | https://www.youtube.com/watch?v=lATG1JBzVIU |
|  13 | https://www.youtube.com/watch?v=SNmZfHDOHQw |
|  14 | https://www.youtube.com/watch?v=IsQBbO_4EQI |
|  15 | https://www.youtube.com/watch?v=wcSyXUOM63g |
|  16 | https://www.youtube.com/watch?v=5hIaJZ9M8ZI |
|  17 | https://www.youtube.com/watch?v=ikryWQEHsCE |
|  18 | https://www.youtube.com/watch?v=5ARVgrao6E0 |
|  19 | https://www.youtube.com/watch?v=_1q6-POT8sY |
|  20 | https://www.youtube.com/watch?v=ycyxm3rgQG0 |
|  21 | https://www.youtube.com/watch?v=InirkRGnC2w |
|  22 | https://www.youtube.com/watch?v=nrvq5lY9oy0 |
|  23 | https://www.youtube.com/watch?v=M1yGh3D_KI8 |
|  24 | https://www.youtube.com/watch?v=Yn_4mtMYyXU |
|  25 | https://www.youtube.com/watch?v=8vmm8x_Cq4s |
|  26 | https://www.youtube.com/watch?v=Zfyojbr-cEQ |
|  27 | https://www.youtube.com/watch?v=NqrVX-WOrc0 |
|  28 | https://www.youtube.com/watch?v=Hx6k20LsAJ4 |
|  29 | https://www.youtube.com/watch?v=OB6ZI5Bicww |
|  30 | https://www.youtube.com/watch?v=uNMnIRKx0GE |
|  31 | https://www.youtube.com/watch?v=U7w_MKl5_hE |
|  32 | https://www.youtube.com/watch?v=KGi4Cpbh_Y0 |
|  33 | https://www.youtube.com/watch?v=mQqRtaoyAdw |
|  34 | https://www.youtube.com/watch?v=s3VzTy9oXXM |
|  35 | https://www.youtube.com/watch?v=eCaojgO-ZWs |
|  36 | https://www.youtube.com/watch?v=SeOLXwvu87E |
|  37 | https://www.youtube.com/watch?v=IlZ6Y21rxTU |
|  38 | https://www.youtube.com/watch?v=HxoRbEQFx3U |
|  39 | https://www.youtube.com/watch?v=NDCAImW1o6o |
|  40 | https://www.youtube.com/watch?v=gE778rR6-EM |
|  41 | https://www.youtube.com/watch?v=cQ0eY9NJACQ |
|  42 | https://www.youtube.com/watch?v=-x5Bx-leRWI |
|  43 | https://www.youtube.com/watch?v=XQ0C_Dmf0hI |
|  44 | https://www.youtube.com/watch?v=0eJ4JRNi4J8 |
|  45 | https://www.youtube.com/watch?v=YczkDCv3GiM |
|  46 | https://www.youtube.com/watch?v=GQmUsdUI20A |
|  47 | https://www.youtube.com/watch?v=4CFnoywFia4 |
|  48 | https://www.youtube.com/watch?v=A0Bzv8weX4s |
|  49 | https://www.youtube.com/watch?v=YbxcaHn_d_o |
|  50 | https://www.youtube.com/watch?v=GwUNT2k26mQ |
|  51 | https://www.youtube.com/watch?v=zktcHftIhDs |
|  52 | https://www.youtube.com/watch?v=_rY7Hvxe4x4 |
|  53 | https://www.youtube.com/watch?v=rqB9gd4fbfE |
|  54 | https://www.youtube.com/watch?v=oNPAhe7G3yg |
|  55 | https://www.youtube.com/watch?v=37_aCQW98sU |
|  56 | https://www.youtube.com/watch?v=GjA4fWIUv-A |
|  57 | https://www.youtube.com/watch?v=8THBFF024ho |
|  58 | https://www.youtube.com/watch?v=HLErXgsV3Nk |
|  59 | https://www.youtube.com/watch?v=GsvdLIxY6Fg |
|  60 | https://www.youtube.com/watch?v=iUU48DuTpl8 |
|  61 | https://www.youtube.com/watch?v=5UluxcFJVx0 |
|  62 | https://www.youtube.com/watch?v=5lOvAHg12uw |
|  63 | https://www.youtube.com/watch?v=2UADjU66-4M |
|  64 | https://www.youtube.com/watch?v=Qvr2labD_Es |
|  65 | https://www.youtube.com/watch?v=qUWRnIn5oB0 |
|  66 | https://www.youtube.com/watch?v=Qk7MPEyGhQ4 |
|  67 | https://www.youtube.com/watch?v=bN7SDJFanS4 |
|  68 | https://www.youtube.com/watch?v=6YoUjUGvHUk |
|  69 | https://www.youtube.com/watch?v=NjiLz3HoWkM |
|  70 | https://www.youtube.com/watch?v=rRdU7VhoWdI |
|  71 | https://www.youtube.com/watch?v=zOm5n0OJLfc |
|  72 | https://www.youtube.com/watch?v=z9jMFiSUe5Q |
|  73 | https://www.youtube.com/watch?v=M6VLYjFnXMU |
|  74 | https://www.youtube.com/watch?v=4iFEpKDQx-o |
|  75 | https://www.youtube.com/watch?v=Zc1SE66DEYo |
|  76 | https://www.youtube.com/watch?v=645qisC4slI |
|  77 | https://www.youtube.com/watch?v=QeIRfgsVX5k |
|  78 | https://www.youtube.com/watch?v=0jUr57dIMq4 |
|  79 | https://www.youtube.com/watch?v=EjaTJGmoT_w |
|  80 | https://www.youtube.com/watch?v=roXy5LA17fU |
|  81 | https://www.youtube.com/watch?v=UeSwqepnAX0 |
|  82 | https://www.youtube.com/watch?v=BDYSYypzhxE |
|  83 | https://www.youtube.com/watch?v=iyBNxEnP7rk |
|  84 | https://www.youtube.com/watch?v=YCUmI9f77qs |
|  85 | https://www.youtube.com/watch?v=h21LYpHEfNU |
|  86 | https://www.youtube.com/watch?v=LBQDuTn6T0c |
|  87 | https://www.youtube.com/watch?v=le_0jyqCXFU |
|  88 | https://www.youtube.com/watch?v=tGClvgTCrIY |
|  89 | https://www.youtube.com/watch?v=969qt4RUx74 |
|  90 | https://www.youtube.com/watch?v=XL8li__PnaA |
|  91 | https://www.youtube.com/watch?v=RKf3ppfFUkg |
|  92 | https://www.youtube.com/watch?v=xY5RyjaQJCE |
|  93 | https://www.youtube.com/watch?v=6bjliN6hJTs |
|  94 | https://www.youtube.com/watch?v=KcYBolH-j9c |
|  95 | https://www.youtube.com/watch?v=nlsnpbRyvtU |
|  96 | https://www.youtube.com/watch?v=AOWmL1eydWI |
|  97 | https://www.youtube.com/watch?v=I8RPsF-hdXo |
|  98 | https://www.youtube.com/watch?v=9NSOGd2p530 |
|  99 | https://www.youtube.com/watch?v=8EdqpZu9lkM |
| 100 | https://www.youtube.com/watch?v=a23wQEA4EAA |
| 101 | https://www.youtube.com/watch?v=7g6TXGY-T6k |
| 102 | https://www.youtube.com/watch?v=iXZNlGwOuWY |
| 103 | https://www.youtube.com/watch?v=miR30bsSH4E |
| 104 | https://www.youtube.com/watch?v=zb8-aHiTKL4 |
| 105 | https://www.youtube.com/watch?v=rTEZmXq9K3k |
| 106 | https://www.youtube.com/watch?v=OBeOJiolMug |
| 107 | https://www.youtube.com/watch?v=fA0nxixnS-A |
| 108 | https://www.youtube.com/watch?v=dMhpDlUTT_U |
| 109 | https://www.youtube.com/watch?v=SgjDaPWjzuU |
| 110 | https://www.youtube.com/watch?v=2lokqffmF2A |
| 111 | https://www.youtube.com/watch?v=jmHZvGMe8pQ |
| 112 | https://www.youtube.com/watch?v=KPYvMIMON9g |

... so on

  • Related