I want to download the files on https://www.osc.ca/en/securities-law/osc-bulletin?keyword=61-101&date[min]=&date[max]=&sort_bef_combine=field_start_date_DESC searching the keyword '61-101'. Here is my code
service = Service(r"C:\Users\Lenovo\Desktop\chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://www.osc.ca/en/securities-law/osc-bulletin')
search = driver.find_element(By.XPATH, '//*[@id="edit-keyword"]')
search_word = '61-101'
search.send_keys(search_word)
search.send_keys(Keys.ENTER)
for i in range(1, 21):
sleep(2)
issue_path = '//*[@id="block-osc-glider-content"]/article/section[3]/div[2]/section[3]/div/div/div/div/div[2]/div/div[3]/div/div[2]/div[' str(i) ']/div/div[1]/div[2]/span[1]/a'
issue = driver.find_element(By.XPATH, issue_path)
issue.send_keys(Keys.ENTER)
driver.switch_to.window(driver.window_handles[1])
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="icon"]/iron-icon//svg')))
download = driver.find_element(By.XPATH, '//*[@id="icon"]/iron-icon//svg')
download.send_keys(Keys.ENTER)
driver.switch_to.window(driver.window_handles[0])
However, this gives TimeoutException, and I try to give different XPATH's for the download button and it still couldn't find the download element. I guess the problem might stem from the fact that the driver cannot switch to a new tab.
CodePudding user response:
options = Options()
download_dir = os.getcwd()
prefs = {
"download.default_directory": download_dir,
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
}
options.add_experimental_option("prefs", prefs)
service = Service(r"C:\Users\Lenovo\Desktop\chromedriver.exe")
driver = webdriver.Chrome(service=service,options=options)
wait = WebDriverWait(driver, 20)
driver.get('https://www.osc.ca/en/securities-law/osc-bulletin')
search = driver.find_element(By.XPATH, '//*[@id="edit-keyword"]')
search_word = '61-101'
search.send_keys(search_word)
search.send_keys(Keys.ENTER)
for i in range(1, 21):
sleep(2)
issue_path = '//*[@id="block-osc-glider-content"]/article/section[3]/div[2]/section[3]/div/div/div/div/div[2]/div/div[3]/div/div[2]/div[' str(i) ']/div/div[1]/div[2]/span[1]/a'
issue = driver.find_element(By.XPATH, issue_path)
issue.send_keys(Keys.ENTER)
wait.until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[1])
driver.close()
driver.switch_to.window(driver.window_handles[0])
To download all the pdfs you need to use pref in the options to download them automatically I made it do the current directory where your file is but you can switch download_dir to any folder path you want.
I'd also suggest some waits for waiting till the handles length is greater than 1.
Imports:
from selenium.webdriver.chrome.options import Options
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC