I'm trying to iterate through multiple nodes and receive various child nodes from the parent nodes. Assuming that I've something like the following structure:
<div >
<div >
<div >
<div >Some data in here</div>
</div>
</div>
<!-- More items listed here -->
</div>
I'm able to receive all child nodes of the wrapper container by using the following.
wrapper = driver.find_element(By.XPATH, '/html/body/div')
items = wrapper.find_elements(By.XPATH, './/*')
Anyways I couldn't figure out how I can now receive the inner HTML of the container containing the information about the item type. I've tried this, but this didn't work.
for item in items:
item_type = item.item.find_element(By.XPATH, './/div/div').get_attribute('innerHTML')
print(item_type)
This results in the following error:
NoSuchElementException: Message: Unable to locate element:
Does anybody knows how I can do that?
CodePudding user response:
In case all the elements their content you want to get are div
with class
attribute value item-type
located inside div
s with class
attribute value item-footer
you can simply do the following:
elements = driver.find_element(By.XPATH, '//div[@]//div[@]')
for element in elements:
data = element.get_attribute('innerHTML')
print(data)
CodePudding user response:
You can use BeautifulSoup
after getting page source from selenium to easily scrape the HTML data.
from bs4 import BeautifulSoup
# selenium code part
# ....
# ....
# driver.page_source is the HTML result from selenium
html_doc = BeautifulSoup(driver.page_source, 'html.parser')
items = html_doc.find_all('div', attrs={'class':'item'})
for item in items:
text = item.find('div', attrs={'class':'item-type'}).text
print(text)
Output:
Some data in here
CodePudding user response:
You need to just find the relative xpath
to identify each element and then iterate it.
items = driver.find_elements(By.XPATH, "//div[@class='wrapper']//div[@class='item']//div[@class='item-type']")
for item in items:
print(item.text)
print(item.get_attribute('innerHTML'))
Or use the css
selector
items = driver.find_elements(By.CSS_SELECTOR, ".wrapper >.item .item-type")
for item in items:
print(item.text)
print(item.get_attribute('innerHTML'))