Situation: I am using selenium to scrape Twitter. My script works perfectly fine as long as I am not running headless. Now I am trying to speed up the process and I am trying to run Chrome in headless mode.
Problem: Once I add: options.add_argument('--headless')
as an option, it stops working.
I found other posts on this issue and with the help driver.get_screenshot_as_file("screenshot.png")
I took a screenshot. On the screenshot I can see the following thing: A twitter logo and the sentence that "This brower is no longer supported". I am confused by this, since everything works fine (meaning that I can get to the "normal" twitter login page) when I disable the "headless" mode.
Goal: I would like to be able to scrape tweets in headless mode
Code:
def setup():
options = Options()
options.add_argument('--headless')
options.add_argument("--window-size=1920,1080")
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://www.twitter.com/login")
driver.get_screenshot_as_file("screenshot1.png")
username = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "text"]')))
username.send_keys('[email protected]')
username.send_keys(Keys.RETURN)
print("finished username") # control to check whether it finished the first part
try:
phone = WebDriverWait(driver, 5).until(EC.presence_of_element_located(("xpath", '//input[@data-testid = "ocfEnterTextTextInput"]')))
phone.send_keys(' 1234567890')
phone.send_keys(Keys.RETURN)
except TimeoutException:
pass
password = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "password"]')))
password.send_keys('mypassword')
password.send_keys(Keys.RETURN)
WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", './/span[contains(text(), "Refuse non-essential cookies")]'))).click()
search_input = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@aria-label = "Search query"]')))
search_input.send_keys('#life')
search_input.send_keys(Keys.RETURN)
print("finished setup")
return driver
Error Message: Line 9 username = WebDriverWait(driver, 30).until(EC.presence_of_element_located(("xpath", '//input[@name = "text"]')))
gives a TimeoutException. This is because the "login-page" looks different/is not existed as described from the screenshot.
I am new to StackOverflow please let me know if anything is unclear or if I should add more information.
Thanks!
CodePudding user response:
I solved the problem. Apparently Twitter was blocking the driver in headless mode. I added
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
options.add_argument(f'user-agent={user_agent}')
as options to the driver. I found the answer in this post chrome --headless mode not working however normal mode is working fine. Everything works now.