Home > Enterprise >  How to collect all hrefs using xpath? Selenium - Python
How to collect all hrefs using xpath? Selenium - Python

Time:05-25

I'm trying to collect all (5) of the social media links from the artist in this example. Currently, my output is only the LAST (fifth) social media link. I'm using selenium, I understand this my not be the best option for collecting this data but its all I know at this time. Note, I've only included relevant code for my question. Thank you in advance for any help/insight.

    from cgitb import text
    from os import link
    from selenium import webdriver
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options
    import time
    from random import randint
    import pandas as pd

    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('disable-infobars')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
    driver = webdriver.Chrome(chrome_options=chrome_options)




for url in urls:
driver.get(https://soundcloud.com/flux-pavilion)


time.sleep(randint(3,4))


try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        socialmedia = (elem.get_attribute("href"))


except:
        links = "none"

artist = {
    'socialmedia': socialmedia,
    }

print(artist)

CodePudding user response:

The problem is not with your XPath-expression, but rather with the (non-existent) list processing of your output code.

Your code output'ed only the last item of the resulting XPath list. That was the problem why you only received one link (it was the last one).

So change the output part of your code to

[...]

url = driver.get("https://soundcloud.com/flux-pavilion")    
time.sleep(randint(3,4))
artist = []

try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        artist.append(elem.get_attribute("href"))


except:
        links = "none"

for link in artist:
    print(link)

And the output will contain all of the values(links) you desire:

driver = webdriver.Chrome(chrome_options=chrome_options)
https://gate.sc/?url=https://twitter.com/Fluxpavilion&token=da4a8d-1-1653430570528
https://gate.sc/?url=https://instagram.com/Fluxpavilion&token=277ea0-1-1653430570529
https://gate.sc/?url=https://facebook.com/Fluxpavilion&token=4c773c-1-1653430570530
https://gate.sc/?url=https://youtube.com/Fluxpavilion&token=1353f7-1-1653430570531
https://gate.sc/?url=https://open.spotify.com/artist/7muzHifhMdnfN1xncRLOqk?si=bK9XeoW5RxyMlA-W9uVwPw&token=bc2936-1-1653430570532
  • Related