Home > database >  I need to find tags in tags, selenium
I need to find tags in tags, selenium

Time:03-20

I need to find certain links on the page, but there is no class or id in the "a" tags. But there is "span" with classes "ipsContained ipsType_break". I would like it to find all "span" first, and then "a" tags in them. Who knows tell me this or a simpler option

I use selenium, here's a sample html that includes links to fetch.

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
  <span class = "ipsContained ipsType_break">
    <a href="link1"></a>
    <a href="link2"></a>
  </span>
  <span class = "ipsContained ipsType_break">
    <a href="link3"></a>
    <a href="link4"></a>
  </span>
</body
</html>

CodePudding user response:

Selenium-based solution:

you can construct an xpath for span tag like this:

//span[@class='ipsContained ipsType_break']

you can store them in a list and then you can get all the child a tags using . and then link using get_attribute method.

Code:

spans =  driver.find_elements(By.XPATH, "//span[@class='ipsContained ipsType_break']")
a_tag_list = []
for span in spans:
    atag = span.find_element(By.XPATH, ".//a")
    print(atag.get_attribute('href'))
    a_tag_list.append(atag.get_attribute('href'))

CodePudding user response:

I use BeautifulSoup to parse html.

from bs4 import BeautifulSoup
html = """<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
  <span class = "ipsContained ipsType_break">
    <a href="link1"></a>
    <a href="link2"></a>
  </span>
  <span class = "ipsContained ipsType_break">
    <a href="link3"></a>
    <a href="link4"></a>
  </span>
</body
</html>"""
soup = BeautifulSoup(html, 'html.parser')
spans = soup.findAll("span", {"class":"ipsContained ipsType_break"})
links = []
for span in spans:
    aElements = span.findAll("a", href=True)
    for a in aElements:
        links.append(a["href"])
print(links)

Prints: ['link1', 'link2', 'link3', 'link4']

CodePudding user response:

links=[x.get_attribute("href") for x in driver.find_elements(By.XPATH,"//span[@class='ipsContained ipsType_break']//a")]

Should get you all the href links for every a tag inside those span class. Another approach other then appending to another list.

  • Related