I have a to solve the captcha, so we will tell pyautogui to locate this box on the screen and then click on it.
So save the image on your computer and call it box.png
. Then run this code (replace ...
with your missing code).
import pyautogui
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
driver.get(url)
driver.maximize_window()
# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))
# wait until the captcha is visible on the screen
WebDriverWait(driver, 9).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#g-recaptcha')))
# find captcha on page
checkbox = pyautogui.locateOnScreen('box.png')
if checkbox:
# compute the coordinates (x,y) of the center
center_coords = pyautogui.center(checkbox)
pyautogui.click(center_coords)
else:
print('Captcha not found on screen')
CodePudding user response:
Based on @sound wave's answer, I was able to invoke the callback function and bypass the captcha without pyautogui. The key was to switch to the captcha's frame using the frame_to_be_available_and_switch_to_it
method. Thanks a mil to @sound wave for the amazing hint.
Here's the full code for anyone who's interested. Keep in mind that you will need a 2captcha API key for it to work.
The thing that I am still trying to figure out is how to operate this script in headless mode because the WebDriverWait
object needs Selenium to be in non-headless mode to switch to the captcha frame. If anyone knows how to switch to the captcha frame while working with Selenium in headless mode, please share your knowledge :)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from twocaptcha import TwoCaptcha
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from dotenv import load_dotenv
import os
import time
# Load environment variables
load_dotenv()
# Instantiate a solver object
solver = TwoCaptcha(os.getenv("CAPTCHA_API_KEY"))
sitekey = "6Lfwdy4UAAAAAGDE3YfNHIT98j8R1BW1yIn7j8Ka"
url = "https://suchen.mobile.de/fahrzeuge/search.html?dam=0&isSearchRequest=true&ms=8600;51;;&ref=quickSearch&sb=rel&vc=Car"
# Set chrome options
chrome_options = Options()
chrome_options.add_argument('start-maximized') # Required for a maximized Viewport
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging', 'enable-automation'])
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
# Instantiate a browser object and navigate to the URL
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get(url)
driver.maximize_window()
# Solve the captcha using the 2captcha service
def solve(sitekey, url):
try:
result = solver.recaptcha(sitekey=sitekey, url=url)
except Exception as e:
exit(e)
return result.get('code')
captcha_key = solve(sitekey=sitekey, url=url)
print(captcha_key)
# html of the captcha is inside an iframe, selenium cannot see it if we first don't switch to the iframe
WebDriverWait(driver, 9).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sec-cpt-if")))
# Inject the token into the inner HTML of g-recaptcha-response and invoke the callback function
driver.execute_script(f'document.getElementById("g-recaptcha-response").innerHTML="{captcha_key}"')
driver.execute_script(f"verifyAkReCaptcha('{captcha_key}')") # This step fails in Python but runs successfully in the console
# Wait for 3 seconds until the "Accept Cookies" window appears. Can also do that with WebDriverWait.until(EC)
time.sleep(3)
# Click on "Einverstanden"
driver.find_element(by=By.XPATH, value="//button[@class='sc-bczRLJ iBneUr mde-consent-accept-btn']").click()
# Wait for 0.5 seconds until the page is loaded
time.sleep(0.5)
# Print the top title of the page
print(driver.find_element(by=By.XPATH, value="//h1[@data-testid='result-list-headline']").text)