Home > Enterprise >  How to transform link extraction into valid element with Selenium Python?
How to transform link extraction into valid element with Selenium Python?

Time:12-30

I need to transform the return of the URL I'm extracting into a valid element.

the code captures the URLs and then enters each one of them to extract data from the page

terminal exit enter image description here

terminal error enter image description here

links = []
classe = driver.find_elements(By. XPATH, "//*[@class='LinksShowcase_UrlContainer__kMj_n']/p")
for i in classe:
    sleep(0.5)
    links.append(i)
    print(links)
    sleep(2)
for linkAtual in links:
    driver.get(linkAtual)

I cannot share the link, as it is a platform that needs to create an account and be accepted, but the link is as text within the TAG 'P', follow the image of the page

enter image description here enter image description here enter image description here

CodePudding user response:

find_elements method return a list of WebElement objects.
These are not links (strings).
WebElement is a reference, a pointer to physical web element on the web page.
WebElement may containg href attribute that normally contains some link.
As mentioned by KunduK anchor elements are normally containing links, not p tag elements.
So, in case elements you collecting are containing links you can extract these links from the WebElement objects and use them later.
I can't debug this code since you did not share a link to page you working on as well as you did not share all your Selenium code, but I guess something like following can work:

links = []
classe = driver.find_elements(By. XPATH, "//*[@class='LinksShowcase_UrlContainer__kMj_n']/p")
for i in classe:
    link = i.get_attribute("href")
    print(link)
    links.append(link)
for linkAtual in links:
    driver.get(linkAtual)

UPD
In your case it is not href attribute but a text content. So, you can simply extract the text as following:

links = []
classe = driver.find_elements(By. XPATH, "//*[@class='LinksShowcase_UrlContainer__kMj_n']/p")
for i in classe:
    link = i.text
    print(link)
    links.append(link)
for linkAtual in links:
    driver.get(linkAtual)
  • Related