Home > Software engineering >  Python: Why does opening Twitter with selenium in headless mode not open the "normal" Twit
Python: Why does opening Twitter with selenium in headless mode not open the "normal" Twit

Time:08-11

Situation: I am using selenium to scrape Twitter. My script works perfectly fine as long as I am not running headless. Now I am trying to speed up the process and I am trying to run Chrome in headless mode.

Problem: Once I add: options.add_argument('--headless') as an option, it stops working.

I found other posts on this issue and with the help driver.get_screenshot_as_file("screenshot.png") I took a screenshot. On the screenshot I can see the following thing: A twitter logo and the sentence that "This brower is no longer supported". I am confused by this, since everything works fine (meaning that I can get to the "normal" twitter login page) when I disable the "headless" mode.

Goal: I would like to be able to scrape tweets in headless mode

Code:

def setup():    
   options = Options()
   options.add_argument('--headless')
   options.add_argument("--window-size=1920,1080")
   options.add_argument('--disable-dev-shm-usage')      
   driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
   driver.get("https://www.twitter.com/login")
   driver.get_screenshot_as_file("screenshot1.png")
   username = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "text"]')))
   username.send_keys('[email protected]')
   username.send_keys(Keys.RETURN)

   print("finished username") # control to check whether it finished the first part

   try:
       phone = WebDriverWait(driver, 5).until(EC.presence_of_element_located(("xpath", '//input[@data-testid = "ocfEnterTextTextInput"]')))
       phone.send_keys(' 1234567890')
       phone.send_keys(Keys.RETURN)
   except TimeoutException:
       pass
   password = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "password"]')))
   password.send_keys('mypassword')
   password.send_keys(Keys.RETURN)
   WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", './/span[contains(text(), "Refuse non-essential cookies")]'))).click()
   search_input = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@aria-label = "Search query"]')))
   search_input.send_keys('#life')
   search_input.send_keys(Keys.RETURN)

   print("finished setup")

   return driver

Error Message: Line 9 username = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "text"]'))) gives a TimeoutException. This is because the "login-page" looks different/is not existed as described from the screenshot.

I am new to StackOverflow please let me know if anything is unclear or if I should add more information.

Thanks!

CodePudding user response:

I solved the problem. Apparently Twitter was blocking the driver in headless mode. I added

user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36' options.add_argument(f'user-agent={user_agent}')

as options to the driver. I found the answer in this post chrome --headless mode not working however normal mode is working fine. Everything works now.

  • Related