options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option('excludeSwitches', ['enable-automation'])
options.add_experimental_option('useAutomationExtension', False)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
I have learned web scraping for a while. To get access to lots of pages, I frequently use the code above to prevent my driver be detected. I put a lot of effort trying to realize how these codes work background, but still clueless. I'm wondering if there is any scraping expert can explain how the web detects a bot, and how the codes help us evade detection.
CodePudding user response:
Javascript fingerprinting and context analysis is often used by websites to tell whether connecting client is a human or a robot.
Since website can execute arbitrary javascript on your web browser instance it can look for small details that are only present in automated browser instances and flag you as a robot.
For example, your line:
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
Adds a script to execute on each new page (browser tab) initiation that removes navigator.webdriver
javascript property which is added by Selenium.
You can actually check this by launching a Selenium browser and opening up developer console and typing in navigator.webdriver
to see what's the value (where's in your personal browser the value is always undefined):
Selenium also adds some browser option flags automatically that you override in your code by explicitly telling it to reverse these:
# Selenium sets this True by default but we can reverse it:
options.add_experimental_option('useAutomationExtension', False)
Browser fingerprinting is very complex and is a bit too much to cover in a single stackoverflow question. For more, see a lengthy blog post I wrote on this subject How to Avoid Web Scraping Blocking: Javascript