Home > Back-end >  Naming PDFs downloaded using python and selenium
Naming PDFs downloaded using python and selenium

Time:09-17

I have a webscraper for work that downloads all pdfs for given filters from our non-public website. I'm trying to name the files "ID Number File name date file was made.pdf" I used the absolute xPath for the File name in a Try statement, but its not working and jumping to the exception. I would appreciate a second set of eyes to see if I have missed anything syntax-wise, or if there's a better way to implement this. I have also copied the xPath as well to see if anyone with more experience can give me the relative xPath to use

Error:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2"}
  (Session info: chrome=93.0.4577.63)

HTML:

enter image description here

My code:

table_rows=driver.find_elements_by_xpath("//a[contains(@href, '#resources/details/?id=')]")
    for link_elem in table_rows:
        url = link_elem.get_attribute('href')
        id_number= url[-8:]
        driver.get(url)
       try:
            filename_first = driver.find_element_by_xpath('/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2').text.replace(':', '').replace(r'/', '-')

       except:
            filename_first = 'file.pdf'
        #filename_first = driver.find_element_by_xpath('/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2').text.replace(':', '').replace(r'/', '-')
        filename_final = id_number   filename_first #  '.pdf'
        css_thing =  '#file > div:nth-child(1) > div.form-group.padding-xs-bottom > div > div > button.btn.btn-danger.get-download-url'
        time.sleep(5)
        download_button = driver.find_element_by_css_selector(css_thing)
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, css_thing))).click()
        time.sleep(5)
        link_data = driver.find_element_by_xpath("//a[contains(@href, 'https://s3.amazonaws.com')]")
        url = link_data.get_attribute("href")
        r = requests.get(url, allow_redirects=True)
        open(filename_final, 'wb').write(r.content)
        print("good")

CodePudding user response:

You are getting NoSuchElement exception because you are using absolute xPath if DOM is dynamic there will be lot of chances for your script failure.

Always use reliable xPath

xPath: //a//div[@class='flex-1 ellipsis padding-xs-right']

CodePudding user response:

Based on the snapshot that you've shared, I believe you can use the below xpath

//a[contains(@href,'#resources/details/')]//div[contains(@class,'ellipsis')]

Also, before using this xpath check in Dev tools that we have 1/1 matching nodes.

Use it like this :

filename_first = driver.find_element_by_xpath("//a[contains(@href,'#resources/details/')]//div[contains(@class,'ellipsis')]").text
print(filename_first)

one you get the desired output with the above code, we can replace it regex to get what you actually looking for.

  • Related