I need to find certain links on the page, but there is no class or id in the "a" tags. But there is "span" with classes "ipsContained ipsType_break". I would like it to find all "span" first, and then "a" tags in them. Who knows tell me this or a simpler option
I use selenium, here's a sample html that includes links to fetch.
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
<span class = "ipsContained ipsType_break">
<a href="link1"></a>
<a href="link2"></a>
</span>
<span class = "ipsContained ipsType_break">
<a href="link3"></a>
<a href="link4"></a>
</span>
</body
</html>
CodePudding user response:
Selenium-based solution:
you can construct an xpath for span tag like this:
//span[@class='ipsContained ipsType_break']
you can store them in a list
and then you can get all the child a tags
using .
and then link using get_attribute
method.
Code:
spans = driver.find_elements(By.XPATH, "//span[@class='ipsContained ipsType_break']")
a_tag_list = []
for span in spans:
atag = span.find_element(By.XPATH, ".//a")
print(atag.get_attribute('href'))
a_tag_list.append(atag.get_attribute('href'))
CodePudding user response:
I use BeautifulSoup to parse html.
from bs4 import BeautifulSoup
html = """<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
<span class = "ipsContained ipsType_break">
<a href="link1"></a>
<a href="link2"></a>
</span>
<span class = "ipsContained ipsType_break">
<a href="link3"></a>
<a href="link4"></a>
</span>
</body
</html>"""
soup = BeautifulSoup(html, 'html.parser')
spans = soup.findAll("span", {"class":"ipsContained ipsType_break"})
links = []
for span in spans:
aElements = span.findAll("a", href=True)
for a in aElements:
links.append(a["href"])
print(links)
Prints: ['link1', 'link2', 'link3', 'link4']
CodePudding user response:
links=[x.get_attribute("href") for x in driver.find_elements(By.XPATH,"//span[@class='ipsContained ipsType_break']//a")]
Should get you all the href links for every a tag inside those span class. Another approach other then appending to another list.