I am trying to do web scraping in .net (c# using) and I got stuck when reCaptcha or hCaptcha are present on the websites? how can I bypass them programmatically?
Is there any hacky way?
CodePudding user response:
You can't.
That's the whole point of a captcha. A captcha is there to ensure a web page is not used by a bot, and so it would be a bug in the captcha algorithm to allow this to be programmatically circumvented.
CodePudding user response:
Use a 3rd party service to solve them such as: https://2captcha.com/2captcha-api
Original answer:
Use a browser control so you can get a visual representation of what Captcha is requesting.
Non-hacky but extremely difficult method:
Create an image from what the control is displaying.
Create an algorithm to determine what Captcha wants and then perform the necessary input. You'll probably have to train AI to do this and it won't work great, good luck!
"Hacky" method (let someone else respond to captcha request):
- Create an image from what the control is displaying.
- Present the image to a real person (preferably in a low wage country) in an application that will allow user to enter text (text captcha) or log mouse x,y locations of clicks (image captcha).
- Return the information collected by the real person to your scraping application.
- Using the collected information, send clicks to the correct x,y location for image captcha or the text for text captcha.
- Determine if captcha worked or not, send new image of control to your worker if it didn't.