Naming PDFs downloaded using python and selenium-CodePudding

I have a webscraper for work that downloads all pdfs for given filters from our non-public website. I'm trying to name the files "ID Number File name date file was made.pdf" I used the absolute xPath for the File name in a Try statement, but its not working and jumping to the exception. I would appreciate a second set of eyes to see if I have missed anything syntax-wise, or if there's a better way to implement this. I have also copied the xPath as well to see if anyone with more experience can give me the relative xPath to use

Error:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2"}
  (Session info: chrome=93.0.4577.63)

HTML:

My code:

table_rows=driver.find_elements_by_xpath("//a[contains(@href, '#resources/details/?id=')]")
    for link_elem in table_rows:
        url = link_elem.get_attribute('href')
        id_number= url[-8:]
        driver.get(url)
       try:
            filename_first = driver.find_element_by_xpath('/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2').text.replace(':', '').replace(r'/', '-')

       except:
            filename_first = 'file.pdf'
        #filename_first = driver.find_element_by_xpath('/html/body/div[1]/div[1]/div[4]/div/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/h2').text.replace(':', '').replace(r'/', '-')
        filename_final = id_number   filename_first #  '.pdf'
        css_thing =  '#file > div:nth-child(1) > div.form-group.padding-xs-bottom > div > div > button.btn.btn-danger.get-download-url'
        time.sleep(5)
        download_button = driver.find_element_by_css_selector(css_thing)
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, css_thing))).click()
        time.sleep(5)
        link_data = driver.find_element_by_xpath("//a[contains(@href, 'https://s3.amazonaws.com')]")
        url = link_data.get_attribute("href")
        r = requests.get(url, allow_redirects=True)
        open(filename_final, 'wb').write(r.content)
        print("good")

CodePudding user response：

You are getting NoSuchElement exception because you are using absolute xPath if DOM is dynamic there will be lot of chances for your script failure.

Always use reliable xPath

xPath: //a//div[@class='flex-1 ellipsis padding-xs-right']

CodePudding user response：

Based on the snapshot that you've shared, I believe you can use the below xpath

//a[contains(@href,'#resources/details/')]//div[contains(@class,'ellipsis')]

Also, before using this xpath check in Dev tools that we have 1/1 matching nodes.

Use it like this :

filename_first = driver.find_element_by_xpath("//a[contains(@href,'#resources/details/')]//div[contains(@class,'ellipsis')]").text
print(filename_first)

one you get the desired output with the above code, we can replace it regex to get what you actually looking for.