How to bypass Captcha while Web Scraping-CodePudding

I am trying to scrape the car details from this site using Selenium: https://www.autoscout24.ch/de/autos/alle-marken?vehtyp=10

Approximately every 30 pages I have to verify that I am not a robot, even though I have included in my code:

driver.implicitly_wait(20)

Is there any way to overcome this?

CodePudding user response：

CAPTCHA is meant for those reasons. There is no co-relation with it being removed due to use of waits in Selenium script. The use of CAPTCHA is to detect that bots/automated systems are not crawling the web page.

Unless you disable it, I don't think that it is the right approach to automate it. Although you may find some tutorials on web to overcome it, but they are very patchy and do not cover all the use cases.

CodePudding user response：

I'm not a robot

newCaptchaAnchor

The "I'm not a robot" checkbox, commonly known as reCAPTCHA v2 is one of the security measure in practice for implementing challenge-response authentication. CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) mainly helps to protect the applications and the systems from spam and password decryption by asking to complete a simple test that proves it's a human and not a computer trying to access into a password protected account. In short CAPTCHA is implemented to help prevent unauthorized account entry.

So neither of the wait mechanism Implicit wait or Explicit wait would be of any help to avoid CAPTCHA

Solution

An ideal approach would be to disable the CAPTCHA for the AUT (Application Under Test) within Testing / Stagging environment and enable it only in production environment.

References

You can find a couple of relevant detailed discussions in: