I am starting to learn about web scraping. For practice, I am trying to get a list with all the courses name that appears in this query: "https://www.udemy.com/courses/search/?src=ukw&q=api python" the problem is when I start the script the web does not load en eventually the windows get closed. I think maybe Udemy has some type of security for automations
This is my code:
from selenium import webdriver
import time
website = "https://www.udemy.com/courses/search/?src=ukw&q=api python"
path = "/"
chrome_options = webdriver.ChromeOptions();
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);
driver.get(website)
time.sleep(5)
matches = driver.find_elements_by_tag_name("h3")
CodePudding user response:
The reason behind udemy website not loading completely may be due to the fact that Selenium driven ChromeDriver initiated Chrome Browser gets detected as a bot and further navigation is getting blocked.
Solution
An easier hack to evade the detection would be to add the following argument:
So effectively your code block will be:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")
Saved Screenshot: