Home > database >  Iterating through a group of elements and collecting the child elements
Iterating through a group of elements and collecting the child elements

Time:07-27

I have the following sample HTML:

<div >
    <div >
        <a href="http://www.url.com/name/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/name/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/name/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>

I am trying to iterate through the list to try and get the following results:

John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting

I am using the following code:

persons = []
for person in driver.find_elements_by_class_name('person'):
    title = person.find_element_by_xpath('.//div[@]/a').text
    company = person.find_element_by_xpath('.//div[@]/a').text

    persons.append({'title': title, 'company': company})

However, the above code only iterates through the first person and not through all the people. Any help is appreciated.

CodePudding user response:

As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
    title = person.find_element_by_xpath('.//div[@]/a').text
    company = person.find_element_by_xpath('.//div[@]/a').text
    persons.append({'title': title, 'company': company})

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response:

The below bs4 example shows that all the .person classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait .

from bs4 import BeautifulSoup

html='''
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup= BeautifulSoup(html,'lxml')

for person in soup.select('.person'):
    title = person.select_one('.title a').text
    print(title)

Output:

John Smith
Phil Collins
Tracy Beaker

Example for selenium:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@]'))):
    title = person.find_element(By.XPATH,'.//div[@]/a').text
    company = person.find_element(By.XPATH,'.//div[@]/a').text

    persons.append({'title': title, 'company': company})
print(persons)


#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response:

One of the corect ways to do it in Selenium would be:

person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
    name = x.find_element(By.CLASS_NAME, "title")
    department = x.find_element(By.CLASS_NAME, "company")
    print(name.text   ',', department.text)

Do not forget to import

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Another way using BeautifulSoup would be:

from bs4 import BeautifulSoup

html = '''
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
    p_name = x.select_one('div.title').text.strip()
    p_company = x.select_one('div.company').text.strip()
    print(p_name    ',', p_company)

This would print out:

John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting

BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

  • Related