I am trying to download a PDF with Chromium and Selenium in Python. I am able to download it, if I download it as a file, but is it possible to get it as a string, like I would if it was downloaded with requests? Eg import requests; requests.get(url).content
?
My current code is
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")
This opens the pdf, but does not download. I can also do
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_experimental_option(
"prefs",
{
"download.default_directory": "/home/username/Downloads/", # Change default directory for downloads
"download.prompt_for_download": False, # To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True, # It will not show PDF directly in chrome
},
)
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("http://www.africau.edu/images/default/sample.pdf")
This works, but it downloads a PDF-file to my downloads folder, which means I would need to read it to get the PDF in code. How can I get the pdf without needing to download it as a file first?
CodePudding user response:
Not clearly getting question but if you want to read the file then just do
import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"
r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there
for chunk in r.iter_content(chunk_size=1024):
# do something
OR
You can download the file with request.get then writing it to disk
import requests
file_url = "http://codex.cs.yale.edu/avi/db-book/db4/slide-dir/ch1-2.pdf"
r = requests.get(file_url, stream = True)
#then open a file to write to py will auto create if not there
with open("python.pdf","wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
# writing one chunk at a time to a pdf file
if chunk:
pdf.write(chunk)
CodePudding user response:
From what you described, I believe the following may achieve your goal:
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("http://www.africau.edu/images/default/sample.pdf")
# Select all text and copy it using hotkeys
ActionChains(driver).key_down(Keys.CONTROL).send_keys('a').send_keys('c').key_up(Keys.CONTROL).perform()
Then, your desired text is on the clipboard and you can paste it where you like via
ActionChains(driver).key_down(Keys.CONTROL).send_keys('v').key_up(Keys.CONTROL).perform()