Home > Back-end >  How do I add selenium & chromedriver to an AWS Lambda function?
How do I add selenium & chromedriver to an AWS Lambda function?

Time:04-12

I am trying to host a webscraping function on aws lambda and am running into webdriver errors for selenium. Could someone show me how you go about adding the chromedriver.exe file and how do you get the pathing to work in AWS Lambda function. This is the portion of my function that has to do with selenium,

from selenium import webdriver from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import Select from selenium.webdriver.chrome.service import Service import pandas as pd import mysql.connector from sqlalchemy import create_engine

url = '``https://covid19criticalcare.com/pharmacies/``'

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.maximize_window() driver.get(url) wait = WebDriverWait(driver, 5)

  1. I tried creating a lambda layer with the chromedriver.exe file

  2. I followed this guide (https://dev.to/awscommunity-asean/creating-an-api-that-runs-selenium-via-aws-lambda-3ck3) but I couldn't add the headless chromium because of the file size pushing me over my function limit (my pandas and numpy dependence layers have taken up most of my space)

  3. I tried driver = webdriver.Chrome(with a path variable) and tried different pathing but wasn't sure what the beginning of the path would be since its on a lambda function.

CodePudding user response:

I've been struggling adding selenium to the aws lambda for last couple days. I have a web scraping function (uses selenium and google api) which extracts data from a website and writes the outputs to a google spreadsheet. Let me explain what i did step by step and how i finally succeeded so you don't have to deal with it as much as me:

1- I tried to add selenium as a layer described here https://www.youtube.com/watch?v=jWqbYiHudt8. What i ended up was, i was succesfull with adding selenium but deployment package is over 250mb (describe lambda quotaas here: How to increase the maximum size of the AWS lambda deployment package (RequestEntityTooLargeException)?) so it did not work.

2- To overcome deployment package size, it is a good option to add as container images(10 gb deployment package size limit). Here is a good explanation of adding as container images https://cloudbytes.dev/snippets/run-selenium-in-aws-lambda-for-ui-testing#using-the-github-repository-directly . i tried it but i could not able to deploy as described due to missing/wrong webdrivers(the shell script seems to be wrong)

3- And finally, i was fully able to publish my selenium function as docker image as described here https://github.com/umihico/docker-selenium-lambda.

There are lots of discussions about which version work with what. The most important issue about selenium is, you have to be careful about package and driver version when deploying to aws lambda.

  • Related