Google rejects my login attempt when my browser is running under control of Selenium?-CodePudding

I am trying to use to automate the removal of spam comments from my YouTube videos. YouTube still does not at this date have an API for the removal of spam comments. I only want to automate the removal of spam comments from my videos on my channel. I have natural language processing experience and the patterns I detect in the spam comments would make it easy for me to remove them. I only need to be able to log into my account manually and then use the Selenium and some code to remove the annoying spam comments.

Please note these points:

I am not using a headless version of Selenium/Chrome. I am running a version of ChromeDriver that automates a full Chrome browser instance.
I am logging in manually by typing in my user name and password myself directly into the browser page's input elements.

So something in the user agent, or a flag ChromeDriver sets, or something else is triggering the rejection?

I did a lot of web searching including on SO and most of the message regarding detection have to do with using a headless version of Selenium or certain input patterns that identify an automated login attempt. As I stated above, I am doing neither of those things since before I trigger my automation code, I login manually. Also I have seen some references to an "undetectable" chrome driver, but is that a trustworthy item and does it only work with a headless version of Selenium?

Can someone tell me how to fix this? I'm really tired of manually deleting the spam comments.

CodePudding user response：

Try this

1- Use a different browser: Some websites are more sensitive to automation when using certain browsers, so you may have better luck using a different browser like Firefox or Safari

2- Use a headless browser: While you mentioned that you are currently using a full Chrome browser instance, you could try using a headless version of Chrome (or another browser) to see if that helps with the automation detection issue. A headless browser is a browser that runs without a graphical user interface, which can make it more difficult for the website to detect automation

3- Use a different ChromeDriver version: Depending on the version of ChromeDriver you are using, it may be setting certain flags or user agent strings that are causing the website to detect automation. You could try using a different version of ChromeDriver to see if that helps

4- Use a different user agent string: Some websites use the user agent string (a string that identifies the browser and operating system) to detect automation. You could try setting a different user agent string in ChromeDriver to see if that helps

CodePudding user response：

I once do some research about this. Some website (may including youtube, google's sites) will prevent your request (or even block your IP) if detect you are using a webdriver (they will think you're bot, no matter you are using headless or not, manually login or not). I think there are two different approachs:

Try hide your webdriver trace: use some package like selenium-stealth, undetected-chromedrive or put some flag manually as this SO answer
Try a different approach, if you want to delete spam comments on Youtube, have you try using their official API, Youtube Comment API Docs. This SO comment may helpful

Some other useful information how they detect webdriver:

Can a website detect when you are using Selenium with chromedriver?

Is there a version of Selenium WebDriver that is not detectable?

CodePudding user response：

If your Chrome browser was spun up using Chromedriver, then there is detectable evidence that websites can use to determine if you're using Selenium, and then they can block you. However, if the Chrome browser is spun up before Chromedriver connects to it, then you have a browser that no longer looks like an automation-controlled one. Modern web automation libraries such as undetectable-chromedriver are aware of this, and so they make sure Chrome is spun up before connecting chromedriver to it.

The modern framework that I use for these situations is SeleniumBase in undetected chromedriver mode. Here's a script that you can use to get past automation detection: (Run with python after installing seleniumbase with pip install seleniumbase)

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.google.com/gmail/about/")
    sb.click('a[data-action="sign in"]')
    sb.type('input[type="email"]', "[email protected]")
    sb.click('button:contains("Next")')
    import pdb; pdb.set_trace()
    # sb.type('input[type="password"]', PASSWORD)
    # sb.click('button:contains("Next")')

It throws in a breakpoint so that you can take control of the script while the browser remains open.