Home > Enterprise >  Download PDF as file object without downloading the file with Chrome and Selenium in Python
Download PDF as file object without downloading the file with Chrome and Selenium in Python

Time:09-01

I am trying to download a PDF with Chromium and Selenium in Python. I am able to download it, if I download it as a file, but is it possible to get it as a string, like I would if it was downloaded with requests? Eg import requests; requests.get(url).content?

My current code is

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")

This opens the pdf, but does not download. I can also do

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_experimental_option(
    "prefs",
    {
        "download.default_directory": "/home/username/Downloads/",  # Change default directory for downloads
        "download.prompt_for_download": False,  # To auto download the file
        "download.directory_upgrade": True,
        "plugins.always_open_pdf_externally": True,  # It will not show PDF directly in chrome
    },
)
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("http://www.africau.edu/images/default/sample.pdf")

This works, but it downloads a PDF-file to my downloads folder, which means I would need to read it to get the PDF in code. How can I get the pdf without needing to download it as a file first?

CodePudding user response:

Not clearly getting question but if you want to read the file then just do

import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"

r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there

for chunk in r.iter_content(chunk_size=1024):
    # do something

OR

You can download the file with request.get then writing it to disk

import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"

r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there

with open("python.pdf","wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):

     # writing one chunk at a time to a pdf file
     if chunk:
         pdf.write(chunk)

CodePudding user response:

From what you described, I believe the following may achieve your goal:

from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")

# Select all text and copy it using hotkeys
ActionChains(driver).key_down(Keys.CONTROL).send_keys('a').send_keys('c').key_up(Keys.CONTROL).perform()

Then, your desired text is on the clipboard and you can paste it where you like via

ActionChains(driver).key_down(Keys.CONTROL).send_keys('v').key_up(Keys.CONTROL).perform() 
  • Related