My code goes into a webpage and scrapes the data per each element/block.
However, each element has multiple classes with same names, which makes the XPath repeat the same value.
For example Author and Session name have the same class names.
How do I use xpath when the class names are the same?
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://index.mirasmart.com/aan2022/SearchResults.php?pg=1')
page_source = driver.page_source
element = driver.find_elements_by_xpath('.//div[@]')
for el in element:
author=el.find_element_by_xpath('.//span[@]').text
sessionName=el.find_element_by_xpath('.//span[@]').text
print(author,sessionName)
CodePudding user response:
Try like below once and confirm. All the details are in a <p>
tag, can get the respective tag with indexing.
driver.get("https://index.mirasmar t.com/aan2022/SearchResults.php?pg=1")
# Collecting all the options.
elements = driver.find_elements(By.XPATH,"//div[contains(@class,'search-results-list')]/div")
for element in elements:
author = element.find_element(By.XPATH,"./div//p[1]") # The first <p> contains Author details.
print(author.get_attribute("innerText"))
session = element.find_element(By.XPATH,"./div//p[2]")# The second <p> contains Session details.
print(session.get_attribute("innerText"))
Output
Author: Rachel Pauley Levi Dygert Aaron Nelson Heather Lau
Session Name: P8: Infectious Disease: Bacteria, Fungi, and Parasites on the Mind and Body 1
Author: Aaron S Zelikovich Eric C Lawson Giana Dawod Dylan Del Papa Mikel Shea Ehntholt Evan Kolesnick Jaclyn Martindale Oluwasinmisola Opeyemi Alexandria Pecoraro Stephanie Reyes Andrew Yoo Aaron L Berkowitz Matthew S Robbins
Session Name:
Author: Jonathan Morena
Session Name: P16: MS Clinical Assessments & Outcome Measures
Author: Gabriela Figueiredo Pucci Tara Samiee Natalie Sholl Shreya Louis Adnan Husein Theandra Madu Carolina Rodriguez Rivera Jenny Rotblat
Session Name:
Author: Claudia Janoschka Marisol Herrera-Rivero Lisa Gerdes Kathrin Koch Heinz Wiendl Reinhard Hohlfeld Monika Stoll Luisa Klotz
Session Name:
CodePudding user response:
Surrounding elements can be used to give the XPath expression more precision
Author:
It reads "get the span
descendant of a p
element that has a strong
descendant with text()=Author:
"
//p[strong[.="Author:"]]/span
Session:
//p[strong[.="Session Name:"]]/span
Or
//div[@]/div[@]/p[strong[.="Session Name:"]]/span