How to recieve inner HTML of a child node in selenium (python)?-CodePudding

I'm trying to iterate through multiple nodes and receive various child nodes from the parent nodes. Assuming that I've something like the following structure:

<div >
    <div >
        <div >
            <div >Some data in here</div>
        </div>
    </div>
    <!-- More items listed here -->
</div>

I'm able to receive all child nodes of the wrapper container by using the following.

wrapper = driver.find_element(By.XPATH, '/html/body/div')
items = wrapper.find_elements(By.XPATH, './/*')

Anyways I couldn't figure out how I can now receive the inner HTML of the container containing the information about the item type. I've tried this, but this didn't work.

for item in items:
    item_type = item.item.find_element(By.XPATH, './/div/div').get_attribute('innerHTML')
    print(item_type)

This results in the following error:

NoSuchElementException: Message: Unable to locate element:

Does anybody knows how I can do that?

CodePudding user response：

In case all the elements their content you want to get are div with class attribute value item-type located inside divs with class attribute value item-footer you can simply do the following:

elements =  driver.find_element(By.XPATH, '//div[@]//div[@]')
for element in elements:
    data = element.get_attribute('innerHTML')
    print(data)

CodePudding user response：

You can use BeautifulSoup after getting page source from selenium to easily scrape the HTML data.

from bs4 import BeautifulSoup

# selenium code part
# ....
# ....
# driver.page_source is the HTML result from selenium

html_doc = BeautifulSoup(driver.page_source, 'html.parser')
items = html_doc.find_all('div', attrs={'class':'item'})
for item in items:
    text = item.find('div', attrs={'class':'item-type'}).text
    print(text)

Output:

Some data in here

CodePudding user response：

You need to just find the relative xpath to identify each element and then iterate it.

items = driver.find_elements(By.XPATH, "//div[@class='wrapper']//div[@class='item']//div[@class='item-type']")
for item in items:
    print(item.text)
    print(item.get_attribute('innerHTML'))

Or use the css selector

items = driver.find_elements(By.CSS_SELECTOR, ".wrapper >.item .item-type")
for item in items:
    print(item.text)
    print(item.get_attribute('innerHTML'))