Home > other >  how do those webdriver option arguments work?
how do those webdriver option arguments work?

Time:07-06

options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option('excludeSwitches', ['enable-automation'])
options.add_experimental_option('useAutomationExtension', False)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})

I have learned web scraping for a while. To get access to lots of pages, I frequently use the code above to prevent my driver be detected. I put a lot of effort trying to realize how these codes work background, but still clueless. I'm wondering if there is any scraping expert can explain how the web detects a bot, and how the codes help us evade detection.

CodePudding user response:

Javascript fingerprinting and context analysis is often used by websites to tell whether connecting client is a human or a robot.

Since website can execute arbitrary javascript on your web browser instance it can look for small details that are only present in automated browser instances and flag you as a robot.

For example, your line:

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})

Adds a script to execute on each new page (browser tab) initiation that removes navigator.webdriver javascript property which is added by Selenium.

You can actually check this by launching a Selenium browser and opening up developer console and typing in navigator.webdriver to see what's the value (where's in your personal browser the value is always undefined): enter image description here

Selenium also adds some browser option flags automatically that you override in your code by explicitly telling it to reverse these:

# Selenium sets this True by default but we can reverse it:
options.add_experimental_option('useAutomationExtension', False)

Browser fingerprinting is very complex and is a bit too much to cover in a single stackoverflow question. For more, see a lengthy blog post I wrote on this subject How to Avoid Web Scraping Blocking: Javascript

  • Related