Iterating through a group of elements and collecting the child elements-CodePudding

I have the following sample HTML:

<div >
    <div >
        <a href="http://www.url.com/name/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/name/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/name/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>

I am trying to iterate through the list to try and get the following results:

John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting

I am using the following code:

persons = []
for person in driver.find_elements_by_class_name('person'):
    title = person.find_element_by_xpath('.//div[@]/a').text
    company = person.find_element_by_xpath('.//div[@]/a').text

    persons.append({'title': title, 'company': company})

However, the above code only iterates through the first person and not through all the people. Any help is appreciated.

CodePudding user response：

As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
    title = person.find_element_by_xpath('.//div[@]/a').text
    company = person.find_element_by_xpath('.//div[@]/a').text
    persons.append({'title': title, 'company': company})

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response：

The below bs4 example shows that all the .person classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait .

from bs4 import BeautifulSoup

html='''
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup= BeautifulSoup(html,'lxml')

for person in soup.select('.person'):
    title = person.select_one('.title a').text
    print(title)

Output:

John Smith
Phil Collins
Tracy Beaker

Example for selenium:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@]'))):
    title = person.find_element(By.XPATH,'.//div[@]/a').text
    company = person.find_element(By.XPATH,'.//div[@]/a').text

    persons.append({'title': title, 'company': company})
print(persons)


#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

CodePudding user response：

One of the corect ways to do it in Selenium would be:

person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
    name = x.find_element(By.CLASS_NAME, "title")
    department = x.find_element(By.CLASS_NAME, "company")
    print(name.text   ',', department.text)

Do not forget to import

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Another way using BeautifulSoup would be:

from bs4 import BeautifulSoup

html = '''
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div >
    <div >
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div >
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
    p_name = x.select_one('div.title').text.strip()
    p_company = x.select_one('div.company').text.strip()
    print(p_name    ',', p_company)

This would print out:

John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting

BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/