Home > Software design >  Is it possible to scrape web sites with python without the browser installed?
Is it possible to scrape web sites with python without the browser installed?

Time:02-15

I am trying to scrape data from a website using python. The problem is: there is no browser installed, and it cannot be installed (it is a pure Debian OS, without the GUI). I was thinking that it might be possible to use a chrome driver and a headless mode in selenium, but it doesn't seem to work.

Here is my test code:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
  
options = Options()
options.headless = True
  
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
  
driver.get('https://www.kino-teatr.ru/')

search_bar = driver.find_element_by_id('search_input_top')  # find search bar
search_bar.send_keys('Avengers')  # enter the name of the movie
search_bar.send_keys(Keys.ENTER)  # get the results

page = driver.page_source
soup = BeautifulSoup(page, 'html.parser')

div = soup.find('div', class_='list_item')  # find the first item
print(div.find('a')['href'])  # find a link to the page

And it gives me the following error

WebDriverException: Message: unknown error: cannot find Chrome binary
Stacktrace:
#0 0x5606b093c113 <unknown>
#1 0x5606b04046d8 <unknown>
#2 0x5606b04259c9 <unknown>
#3 0x5606b042319a <unknown>
#4 0x5606b045de0a <unknown>
#5 0x5606b0457f53 <unknown>
#6 0x5606b042dbda <unknown>
#7 0x5606b042eca5 <unknown>
#8 0x5606b096d8dd <unknown>
#9 0x5606b0986a9b <unknown>
#10 0x5606b096f6b5 <unknown>
#11 0x5606b0987725 <unknown>
#12 0x5606b096308f <unknown>
#13 0x5606b09a4188 <unknown>
#14 0x5606b09a4308 <unknown>
#15 0x5606b09bea6d <unknown>
#16 0x7f35ddc8bea7 <unknown>

I've already tried installing the driver as described here and installing additional libraries as described here, but with no success.

Is it possible to use selenium without the installed browser and what should I do to achieve that?

Thanks in advance for any help or advice!

CodePudding user response:

You can try to install requests lib and do the following to get required HTML page:

>>> import requests
>>> url = 'https://www.geeksforgeeks.org'
>>> response = requests.get(url).text
>>> '7 Alternative Career Paths For Software Engineers' in response
True

Then you can use LXML or BeautifulSoup to parse the page

UPDATE

from lxml import html

response = requests.post('https://www.kino-teatr.ru/search/', data={'text':'мстители'.encode('cp1251')}).content
doc = html.fromstring(response)
entries = doc.xpath('//div[@]/h4')
first_movie = entries[0].text_content()
  • Related