Home > Enterprise >  Selenium --headless doesn't work for my project
Selenium --headless doesn't work for my project

Time:09-17

I am working on a scraping project for a well-known ecommerce page. I would like the browser not to be displayed and the solution to the problem that always arises is to use the "--headless" option, but the page to be scraped does not allow "headless" . I tried too with "--no-startup-window" and it doesn't seem to work either. Does anyone have an alternative solution?

Here my code:

    import random
    from django.shortcuts import render
    from bs4 import BeautifulSoup
    from selenium import webdriver

    #Selenium 4 with Chrome
    from selenium.webdriver.chrome.service import Service as ChromeService
    from webdriver_manager.chrome import ChromeDriverManager

    def wlista(request):
       user_agent =  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
       headers = {
         'User-Agent':user_agent,
         'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
         'Accept-Language': 'es-ES;es;q=0.8',
         'DNT': '1',
         'Connection': 'keep-alive',
         'Upgrade-Insecure-Requests': '1',
       }

       opciones = webdriver.ChromeOptions()
       opciones.add_argument(user_agent)
       #opciones.add_argument('--headless')
       #opciones.add_argument('--no-startup-window')
       opciones.add_experimental_option('excludeSwitches', ['enable-automation'])
       opciones.add_experimental_option('excludeSwitches', ['enable-logging'])
       opciones.add_experimental_option('useAutomationExtension', False)

       DRIVER = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), chrome_options=opciones)
       DRIVER.get('https://www.walmart.com/search?q=lego toys')

       soup = BeautifulSoup(DRIVER.page_source, 'html.parser')
       rows = soup.find_all(attrs={"data-item-id": True})

       for items in rows:
          #do something
          pass

       DRIVER.quit
       return render(request, "Proyectowebapp/listaprods.html", {
         #variables to pass
        })

Thanks for the help!

CodePudding user response:

  • use this class :
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    options=Options()
    options.add_argument('--headless')
    
  • CodePudding user response:

    Some of the websites won't allow running in 'headless' mode.

    I tried using 'headless' mode for the link you mentioned - 'https://www.walmart.com/search?q=lego toys' and printing the title. It printed the title as 'Robot or human?'.

    But without 'headless' mode it printed the correct title - 'lego toys - Walmart.com'.

    Also, there is another example, the website - 'https://www.redbus.in/', while trying to print the title in 'headless' mode it used to print 'Access Denied'.

    • Related