I'm trying to collect all (5) of the social media links from the artist in this example. Currently, my output is only the LAST (fifth) social media link. I'm using selenium, I understand this my not be the best option for collecting this data but its all I know at this time. Note, I've only included relevant code for my question. Thank you in advance for any help/insight.
from cgitb import text
from os import link
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
from random import randint
import pandas as pd
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('disable-infobars')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
driver = webdriver.Chrome(chrome_options=chrome_options)
for url in urls:
driver.get(https://soundcloud.com/flux-pavilion)
time.sleep(randint(3,4))
try:
links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
for elem in links:
socialmedia = (elem.get_attribute("href"))
except:
links = "none"
artist = {
'socialmedia': socialmedia,
}
print(artist)
CodePudding user response:
The problem is not with your XPath-expression, but rather with the (non-existent) list processing of your output code.
Your code output'ed only the last item of the resulting XPath list. That was the problem why you only received one link (it was the last one).
So change the output part of your code to
[...]
url = driver.get("https://soundcloud.com/flux-pavilion")
time.sleep(randint(3,4))
artist = []
try:
links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
for elem in links:
artist.append(elem.get_attribute("href"))
except:
links = "none"
for link in artist:
print(link)
And the output will contain all of the values(links) you desire:
driver = webdriver.Chrome(chrome_options=chrome_options)
https://gate.sc/?url=https://twitter.com/Fluxpavilion&token=da4a8d-1-1653430570528
https://gate.sc/?url=https://instagram.com/Fluxpavilion&token=277ea0-1-1653430570529
https://gate.sc/?url=https://facebook.com/Fluxpavilion&token=4c773c-1-1653430570530
https://gate.sc/?url=https://youtube.com/Fluxpavilion&token=1353f7-1-1653430570531
https://gate.sc/?url=https://open.spotify.com/artist/7muzHifhMdnfN1xncRLOqk?si=bK9XeoW5RxyMlA-W9uVwPw&token=bc2936-1-1653430570532