I am trying, using selenium, to click the PDF icon (shown in screenshot 2) for each element (each of the containers shown in screenshot 1). The problem is that the identifiers for the PDF icons are limited, so I am restricted to locating them with an XPATH by class. At each iteration of the for elem in issues_numb:
statement, the script clicks the first PDF icon it finds on the page, as it is the first element associated with the XPATH fed to the script. Is there a way to create a nested loop that for each instance of a class (article titles) clicks the instance of another class (PDF icons) that's associated to it? So for the first article, click the first PDF icon etc...
HTML code:
<section aria-label="Metadata for Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea" class="article-list-item-content-block ">
<div class="title " data-ember-action="" data-ember-action-1069="1069">
<div id="ember1070" class="ember-view"><a target="_blank" href="/libraries/1374/articles/504204400" id="ember1071" class="ember-view" tabindex="0"> Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea
</a></div> </div>
<!---->
<div class="metadata">
<!---->
<span tabindex="0" class="pages ">
p. 489
</span>
<!---->
<span class="authors" data-ember-action="" data-ember-action-1082="1082">
<span tabindex="0" class="preview tabindex">
Iqbal, Sajid; Vohra, Muhammad Sufyan; Janjua, Hussnain Ahmed
</span>
</span>
<div class="abstract" data-ember-action="" data-ember-action-1083="1083">
<div tabindex="0" class="preview tabindex">
<div id="ember1088" class="ember-view"> <span class="lt-line-clamp__line">In the current study, strain MW-6 isolated from Arabian seawater exhibited broad-spectrum antibacterial activity</span>
<span class="lt-line-clamp__line">against indicator bacterial pathogens. The partially extracted antibacterial metabolites with ethyl acetate revealed</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
promising activity against, and. The minimum inhibitory concentrations (MICs) were determined against indicator stra<span class="lt-line-clamp__ellipsis"><div class="lt-line-clamp__dummy-element">…</div>
<!----> </span></span>
<!----><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">…</span></div>
</div>
</div>
</div>
<!---->
<div class="content-overflow " data-ember-action="" data-ember-action-1089="1089">
<span class="chevron icon flaticon solid down-2"></span>
</div>
<div class="tools ">
<div class="buttons noselect">
<div class="button invisible download-pdf" data-ember-action="" data-ember-action-1090="1090">
<div id="ember1091" class="ember-view"><a aria-label="Download PDF" target="_blank" href="/libraries/1374/articles/504204400/pdf" id="ember1092" class="tooltip ember-view" tabindex="0"> <span aria-hidden="true" class="icon fal fa-file-pdf"></span>
<span class="aria-hidden">Download PDF - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a></div> </div>
<div class="button invisible read-full-text" data-ember-action="" data-ember-action-1097="1097">
<div id="ember1098" class="ember-view"><a aria-label="Link to Article" target="_blank" href="/libraries/1374/articles/504204400" id="ember1099" class="tooltip ember-view" tabindex="0"> <span aria-hidden="true" class="icon fal fa-link"></span>
<span class="aria-hidden">Link to Article - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a></div> </div>
<div class="button invisible add-to-my-articles" data-ember-action="" data-ember-action-1100="1100">
<a aria-label="Save to My Articles" class="tabindex tooltip" tabindex="0">
<span aria-hidden="true" class="icon fal fa-folder"></span>
<span class="aria-hidden">Save to My Articles - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible citation-services" data-ember-action="" data-ember-action-2165="2165">
<a tabindex="0" aria-label="Export Citation" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-graduation-cap"></span>
<span class="aria-hidden">Export Citation - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible social-media-services" data-ember-action="" data-ember-action-2166="2166">
<a tabindex="0" aria-label="Share" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-share-alt"></span>
<span class="aria-hidden">Share - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
</div>
</section>
My code:
issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")
parent_tab = driver.current_window_handle
for elem in issues_numb:
title_article = elem.get_attribute("aria-label")
print(title_article[13:])
try:
check_buttons = driver.find_element(By.XPATH,".//span[@class='icon fal fa-file-pdf']")
print("pdf object found for", str(elem))
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
WebDriverWait(driver, timeout).until(element_present)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
else:
print("No PDF available")
except NoSuchElementException:
get_article_name()
issues_numb
variable refers to this element:
tools_box
variable refers to this element:
CodePudding user response:
When you start XPath with a double slash(//
), the engine starts looking form root everywhere in the content.
Therefore you should change the XPath-expressions inside the loops by adding a .
in front of the //
.
This way you tell the engine to use the current context in stead of the root
Just to give you an idea, your code should look like this.
Btw: it's good practice to share the actual html, so your code and question is better to understand.
issues_numb = driver.find_elements(By.XPATH, "//div[@class='issue ember-view']")
for elem in issues_numb:
button = ActionChains(driver).move_to_element(elem).click(elem).perform()
check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
tools_box = driver.find_elements(By.XPATH, ".//div[@class='buttons noselect']")
for box in tools_box:
element_present = EC.presence_of_element_located((By.XPATH, ".//span[@class='icon fal fa-file-pdf']"))
WebDriverWait(driver, timeout).until(element_present)
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
parent_tab = driver.current_window_handle
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
time.sleep(10)
print(driver.current_url)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
CodePudding user response:
Answer by @AbdulAzizBarkat in comments
The way to solve a situation like this, i.e. only having access to an identifier that is shared by multiple elements (in my case a class name that is shared by multiple PDF icons), is to specify a context in which to look. This way the driver will only look in the HTML code that is relevant to the specific area of search you're after. More on this here. Here too but selenium's proper syntax has changed since then. This is syntax is the updated version:
elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")
for elem in elements:
targeted_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")