I am using selenium to scrape the amazon search results page. As I was wrapping it up, I moved my scraping to headless mode as it will save on efficiency. However in headless mode, certain page elements do not become available such as sponsored brand. It works perfectly fine when using non-headless mode, but fails using headless even after setting the following options:
options = Options()
#options.headless = True
options.add_argument("--window-size=1920,1080")
options.add_argument("--disable-extensions")
options.add_argument("--proxy-server='direct://'")
options.add_argument("--proxy-bypass-list=*")
options.add_argument("--start-maximized")
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
driver = webdriver.Chrome(options=options)
PS: I tried with and without the commented section as well as with just the commented section.
For clarification purposes I screenshotted each example: this is what it looks like when it run it in headless mode and this is what it normally looks like (without headless mode as well as normal user browsing). I am wondering what else needs to be added in order for the sponsored brand information to show up when I run it on headless mode. I am thinking it may be a problem with JavaScript not communicating properly with the browser?
As always, thank you in advance!!
CodePudding user response:
Using the latest Google Chrome v95.0
When you use the normal headed google-chrome browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
Where as when you use the google-chrome-headless browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/95.0.4638.69 Safari/537.36
The presence of the additional Headless
parameter/attribute is intercepted as a bot. Hence you see the difference.