I have the following sample HTML:
<div >
<div >
<a href="http://www.url.com/name/">John Smith</a>
</div>
<div >
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/name/">Phil Collins</a>
</div>
<div >
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/name/">Tracy Beaker</a>
</div>
<div >
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
I am trying to iterate through the list to try and get the following results:
John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting
I am using the following code:
persons = []
for person in driver.find_elements_by_class_name('person'):
title = person.find_element_by_xpath('.//div[@]/a').text
company = person.find_element_by_xpath('.//div[@]/a').text
persons.append({'title': title, 'company': company})
However, the above code only iterates through the first person and not through all the people. Any help is appreciated.
CodePudding user response:
As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
title = person.find_element_by_xpath('.//div[@]/a').text
company = person.find_element_by_xpath('.//div[@]/a').text
persons.append({'title': title, 'company': company})
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
The below bs4 example shows that all the .person
classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath
locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait
.
from bs4 import BeautifulSoup
html='''
<div >
<div >
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div >
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div >
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div >
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup= BeautifulSoup(html,'lxml')
for person in soup.select('.person'):
title = person.select_one('.title a').text
print(title)
Output:
John Smith
Phil Collins
Tracy Beaker
Example for selenium:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@]'))):
title = person.find_element(By.XPATH,'.//div[@]/a').text
company = person.find_element(By.XPATH,'.//div[@]/a').text
persons.append({'title': title, 'company': company})
print(persons)
#imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
One of the corect ways to do it in Selenium would be:
person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
name = x.find_element(By.CLASS_NAME, "title")
department = x.find_element(By.CLASS_NAME, "company")
print(name.text ',', department.text)
Do not forget to import
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Another way using BeautifulSoup would be:
from bs4 import BeautifulSoup
html = '''
<div >
<div >
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div >
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div >
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div >
<div >
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div >
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
p_name = x.select_one('div.title').text.strip()
p_company = x.select_one('div.company').text.strip()
print(p_name ',', p_company)
This would print out:
John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting
BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/