So I'm trying to scrape some forecasting data from a website periodically, and ideally I would like for it to happen in the background. I had a look at some documentation and came up with the following code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options = options)
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome()
driver.get('https://www.windguru.cz/53')
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#forecasts-page")))
\#Scraping block of code goes here
driver.quit()
I think the following line is over-riding the --headless argument but i'm not sure.
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#forecasts-page")))
The reason I have it added in the first place is that the website I'm scraping isn't just static html (have a look for yourself, link is in code). I think there's some js that prompts the forecast data to load, so I need to wait a bit and make sure before the script starts scraping the dom.
Any idea how I can achieve this and run the browser in headless mode?
CodePudding user response:
The line that's opening the chrome is this
driver = webdriver.Chrome()
You can do away with this and code should still work the same.
CodePudding user response:
To run the script perfectly, you also have to add window maximize size as argument as follows:
options.add_argument("start-maximized")