I am building an app using AWS lambda function. My own code logic is very simple: one python file calling some ML functions. On the contrary, it requires a lot of packages (dependencies). I know for smaller cases I can zip these dependencies and upload the zip file for the lambda function. However, it doesn't fit my use case as the zip file can easily exceed the 250 MB size limit. So I have to use the Docker image approach - create a lambda function directly from the Docker image.
I am using sam tool to build / deploy changes. Deploy is very time consuming as it needs to push a very big (6G) image to ECR. The worst thing is I have to endure this every time even making a small changes on my own code and never touching the dependencies.
Can I apply the same way as zip approach, i.e, only include dependencies in the docker image and put customized logic out of it? Is it doable?
If 1 is not possible, what is the best practice / tips for this? I guess there are some magic in Dockerfile but I am pretty noob here. Any demo / sample codes would be great. I linked my dockerfile below:
FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.9
RUN python3.9 -m pip install --upgrade pip
COPY requirements.txt ./
RUN python3.9 -m pip install -r requirements.txt -t .
COPY *.py ./
# Command can be overwritten by providing a different command in the template directly.
CMD ["app.lambda_handler"]
I found some suggestions to upload the package to S3 and download it from S3 into tmp folder. This doesn't sound very cool and even if it is, there is also a 512 MB limit which still too small for my use case.
CodePudding user response:
6GB for a Docker image is really a lot, and it will cost you a lot to run it as a Lambda.
There are many things you could try to slim the image down or reorganise your application.
Remove unneeded code. You could use a multi-stage build where you pre-build your application in one step and then use a very slim image for the runtime.
Manage static assets in S3. Upload them before you deploy the Lambda.
Or, consider using Lambda layers for generic code. This means, you can pre-deploy generic parts of your application and only update the logic you’re currently working on.
Another option might be to deploy multiple Lambdas and have them communicate through events or Function URLs.
By the way, have you considered running on ECS rather than Lambda? I don’t know your exact requirements, but this might be more convenient to deploy via ECR and Cloudformation (or, Terraform) as well as potentially more cost efficient in the end. Though the above suggestions (reduce size, extract assets, etc.) would be applicable to ECS as well.
CodePudding user response:
Disclaimer: I often talk about Google Cloud technologies, I am a GDE on Cloud.
We had this issue for a customer, and we researched a lot AWS what could offer us, and in the end we implemented it on Google Cloud.
The main reason was, we had only developer skills, no sysadmin. We understood that we need to put dependencies separately and we found Google Cloud run helps in that. So our solution was:
- large dependencies (eg from requirements.txt the large
/site-packages
folder built once, and placed on a network drive or Cloud Storage. - package just our code into container, having this way just a 50MB container
Read further with some steps and references to what we used further.
If you are allowed, try out on Cloud Run:
- package your code as a Container
- run your container in Cloud Run
- Cloud Run 2nd gen runtime lets you to mount disks
- move/keep your assets on this network disk
- they will be automatically mounted on each start (see this)
- leverage Docker Slim to remove surface attacks and some not needed files.
In the end, you will have a
- serverless function/app that behaves exactly as lambda
- assets on networking disk
- in container just your app/function
- managed serverless environment for your app/function
- 2M requests free
- up to 32GB RAM at your disposable
- always on CPU, or minimum instances option available to speed up cold starts