Download PDF as file object without downloading the file with Chrome and Selenium in Python-CodePudding

I am trying to download a PDF with Chromium and Selenium in Python. I am able to download it, if I download it as a file, but is it possible to get it as a string, like I would if it was downloaded with requests? Eg import requests; requests.get(url).content?

My current code is

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")

This opens the pdf, but does not download. I can also do

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_experimental_option(
    "prefs",
    {
        "download.default_directory": "/home/username/Downloads/",  # Change default directory for downloads
        "download.prompt_for_download": False,  # To auto download the file
        "download.directory_upgrade": True,
        "plugins.always_open_pdf_externally": True,  # It will not show PDF directly in chrome
    },
)
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("http://www.africau.edu/images/default/sample.pdf")

This works, but it downloads a PDF-file to my downloads folder, which means I would need to read it to get the PDF in code. How can I get the pdf without needing to download it as a file first?

CodePudding user response：

Not clearly getting question but if you want to read the file then just do

import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"

r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there

for chunk in r.iter_content(chunk_size=1024):
    # do something

You can download the file with request.get then writing it to disk

import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"

r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there

with open("python.pdf","wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):

     # writing one chunk at a time to a pdf file
     if chunk:
         pdf.write(chunk)

CodePudding user response：

From what you described, I believe the following may achieve your goal:

from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")

# Select all text and copy it using hotkeys
ActionChains(driver).key_down(Keys.CONTROL).send_keys('a').send_keys('c').key_up(Keys.CONTROL).perform()

Then, your desired text is on the clipboard and you can paste it where you like via

ActionChains(driver).key_down(Keys.CONTROL).send_keys('v').key_up(Keys.CONTROL).perform()